# Low-Power Flip-Flops: Survey, Comparative Evaluation, and a New Design

Ahmed Sayed and Hussain Al-Asaad

*Abstract*—Synchronous logic design is the dominant main stream integrated circuit design methodology. Flip-flops are an inherent building block in any synchronous design. Furthermore flip-flops constitute most of the load on the clock distribution and power networks, which are the main power consuming networks of a synchronous integrated circuit. We survey, design and simulate a superset of flip-flops designed for low power and high performance. We highlight the basic design features of these flip-flops and evaluate them based on timing characteristics, power consumption, and other metrics. Moreover, we propose a new flip-flop design. We go in depth into a finer granularity comparison of the lowest peak power surveyed flip-flops reported in the literature; we show the competitiveness of the new design and make our recommendations.

*Index Terms*—Flip-flop design, Low-Power circuits, Power and delay estimation, VLSI circuits.

#### I. INTRODUCTION

As the feature size of CMOS technology process shrinks the more transistors there are the more switching and the more power dissipated in the form of heat or radiation. Heat is one of the most important packaging challenges in this era; it is one of the main drivers of low power design methodologies and practices. Another mover of low power research is the reliability of the integrated circuit. More switching implies higher average current is flowing and therefore the probability of reliability issues occurring rises.

The most important prime mover of low power research and design is our convergence to a mobile society. With this profound trend continuing, and without a matching trend in battery life expectancy, the more low power issues will have to be addressed. This entails that low power tools and methodologies have to be developed and adhered to. The current trends will eventually mandate low power design automation on a very large scale to match the trends of power consumption of today's integrated chips.

Most of the current designs are synchronous which implies that flip-flops and latches are involved in one way or another in the data and control paths. One of the challenges of low power methodologies for synchronous systems is the power consumption of these flip-flops and latches. It is important to save power in these flip-flops and latches without compromising state integrity or performance.

Several researchers have worked on low power flip-flop design, but they are mostly focused on one or a few types of flip-flops or applications. The need for comparing different designs and approaches is the main motivation for this paper. The main trade-offs of any flip-flop are very important for a design engineer when designing a circuit or for a tool that automates the process of design.

The rest of this paper is organized as follows. Section 2 presents background information about flip-flop design and characteristics. Section 3 presents the flip-flop circuits surveyed with a short description of each flip-flop and Section 4 presents the simulation and evaluation results of these flip-flops. Section 5 introduces our new flip-flop design and presents the comparative evaluation for the new flip-flop against the three flip-flop designs with the least peak power obtained from Section 4. Finally, Section 6 presents some remarks and conclusions.

#### II. BACKGROUND

#### A. POWER CONSUMPTION IN LOGIC CIRCUITS

The instantaneous power of any circuit is calculated as follows [10]:

$$P(t) = i_{dd}(t)V_{dd} \tag{1}$$

The above equation assumes that the voltage power supply is stable and constant throughout operation. The energy consumed over the time interval T is the integral of the instantaneous power:

$$E = \int_{0}^{1} i_{d d}(t) V_{d d} dt$$
(2)

The average power used over the interval is just the energy divided by the time:

$$P_{avg} = \frac{E}{T} = \left(\frac{1}{T}\right) \int_{0}^{T} i_{dd}(t) V_{dd} dt$$
(3)

For CMOS digital circuits, equation (3) can be further expressed in the following equation:

$$P_{avg} = p_t (C_L V V_{dd} f_{clk}) + I_{sc} V_{dd} + I_{leakage} V_{dd}$$
(4)

The above equation consists of three terms and hence illustrates that there are three major sources of power consumption in a digital CMOS circuits. The first term represents the switching component of power, where  $C_L$  is the effective switched loading capacitance,  $f_{clk}$  is the clock frequency and  $p_t$  is the probability that a power consuming transition occurs (referred to as the activity factor in other publications). In most cases, the voltage swing V is the same as the supply voltage  $V_{dd}$ . However, in some logic design styles such as in pass-transistor logic, the voltage swing on some internal nodes may be slightly less. It is important to point out, that the effect of internal glitching should be included as a component of short circuit power consumption.

The second term is caused by the direct path short circuit current  $I_{sc}$ , which arises when both the NMOS and PMOS transistors or networks are simultaneously active or on, conducting current from the supply  $V_{dd}$  to ground. Finally, a factor that is growing more and more important as we

develop deep submicron technologies, leakage current  $I_{leakage}$ , which can arise from substrate injection, gate leakage and sub-threshold effects and other mechanisms.  $I_{leakage}$  is primarily dependent on the CMOS fabrication process technology and modeled based on its characterization.

The dominant term in a well-designed circuit is the switching component, thus the low-power design goal becomes the task of minimizing  $p_t(C_L V V_{dd} f_{clk})$ , while retaining the required functionality and identifying the cost of such minimizations in terms of area and/or performance.

The peak power consumption could be very useful when trying to find out the worst case scenario for your design or system, for example, the worst case of battery life expectancy of your laptop or cell phone. This is measured as the worst case or maximum instantaneous current drawn from the supply within a specific time period of interest and is expressed as:

$$P_{peak} = max(i_{dd}(t))V_{dd}$$
(5)

We chose the peak power consumption to be measured because this is really the parameter to be concerned with during the design phase of a system with lots of flip-flops. The clock and power delivery networks should be capable of withstanding the peak power consumption of the system without failing. Average power is a design metric for how much power would be used on average and battery longevity. Average power in generic logic circuits is dependent on activity and switching probabilities, which in turn are very dependent on the application, but it is heavily correlated to instantaneous power in the case of flip-flops due to the very limited number of inputs possibilities.

The peak power measurement is not as problematic in for flip-flop circuits as the number of inputs is limited and the relative timings are direct forward, i.e. within the clock period of operation.

The power-delay product (PDP) can be viewed as the amount of energy expended in each switching event and is thus particularly important in comparing the power consumption of various circuits and design styles. Assuming that the full swing switching component of (4) is dominant, this metric becomes:

$$PDP_{avg} = p_t(C_L V V_{dd} f_{clk}) / f_{clk} = p_t(C_L V_{dd}^2)$$
(6)

A more performance oriented metric for circuits and design styles would be the energy-delay product. This is considered if performance is of a higher importance and priority than power consumption. This will not be used here since low power is our highest priority.

#### B. FLIP-FLOP COMPARISON METRICS

There are several basic performance metrics that are used to qualify a flip-flop and compare it to other designs [3]. These metrics are:

Clock-to-Q delay: Propagation delay from the clock input to the output Q terminal. This is assuming that the data input D is set early enough with respect to the effective edge of the clock input signal. Worst case input edge is used throughout this paper.

Setup time: The minimum time needed between the D input signal change and the triggering clock signal edge on the clock input. This metric guarantees that the output will

follow the input in worst case conditions of process, voltage and temperature (PVT). This assumes that the clock triggering edge and pulse have enough time to capture the data input change.

•Hold time: The minimum time needed for the D input to stay stable after the occurrence of the triggering edge of the clock signal. This metric guarantees that the output Q stays stable after the triggering edge of the clock signal occurs, under worst PVT conditions. This metric assumes that the D input change happened at least after a minimum delay from the previous D input change.

•Data-to-Q delay: The sum of setup of data to the D input of flip-flop and the Clock-to-Q delay as defined above.

Knowing that flip-flops are always in the critical path of a synchronous design standard cell library developers always try their best to minimize the setup time requirement of flip-flops and the Clock-to-Q delay to target the highest possible frequency for the design at hand.

Hold times are not as critical as setup times and they do not impose an upper bound on the speed of a circuit in flip-flop based designs. On the other hand they are very critical in latch-based designs.

# C. REGIONS OF FLIP-FLOP OPERATION

There are three regions of flip-flop operation [8, 9], of which only one region is acceptable for a sequential design to function correctly. These regions are:

•Stable region: Where the setup and hold times of a flip-flop are met and the Clock-to-Q delay is not dependent on the D-to-Clock delay. This is the required region of operation.

•Metastable region: As D-to-Clock delay decreases, at a certain point the Clock-to-Q delay starts to rise exponentially and ends in failure. In this region, the Clock-to-Q delay is nondeterministic causing intermittent failures and behaviors which are very difficult to debug in real circuits not to mention silicon.

•Failure region: Where changes in data are unable to be transferred to the output of the flip-flop.

Figure 1 illustrates the different regions of flip-flop operation. The optimal setup time noted on the graph would be the highest performance D-to-Clock delay to accomplish fastest D-to-output delay. Due to the steep curve to the left of that point not all library developers would target this value. Instead, they would prefer adding guard bands to any library cell or design to guarantee stability and reliability.



III. SURVEYED CIRCUITS

Flip-flops can be classified in several ways: dynamic vs. static, square-wave vs. pulsed, conditional vs. non-conditional, and also according to the logic style used. In this paper we consider different flip-flop circuits to gain

real insights in these different classifications.

These flip-flop circuits are extracted from references [3, 4, 5, 6, 7] and are shown in Figure 2 and Figure 3. They were built using cadence schematic capture Virtuoso tool and sized for minimum possible size to function correctly. The following is a short description of the flip-flop circuits.

F01 is the Power PC master-slave latch. It is one of the fastest classical structures and its main advantage is the short direct path and low power feedback. The large load on the clock will greatly affect the total power consumption of the flip-flop. This flip-flop is called the transmission gate flip-flop, it has a fully static master–slave structure, which is constructed by cascading two identical pass gate latches and provides a short clock to output latency. It does have a bad data to output latency because of the positive setup time. Sensitivity to clock signal slopes and data feed through is another concern when using it.

F02 is the modified standard dynamic  $C^2MOS$  master-slave latch that has shown good low power features, like small clock load and low power feedback. The modified  $C^2MOS$  is also robust to clock signal slopes. F03 is the hybrid–latch flip-flop (HLFF) that is one of the fastest flip-flop structures. It is robust to clock signal slopes, but it does have a positive hold time. This is very suitable for high performance systems.

F04 is another hybrid flip-flop, the semi-dynamic flip-flop (SDFF). It is one of the fastest structures if not the fastest of all the flip-flops described in this paper. It does have a large clock load and large effective pre-charge capacitance which result in a slightly high power consumption. This is still best suited for high performance designs, though its power consumption is moderate. F05 is the K6 edge-triggered latch (ETL) with the reset circuitry removed. It is very fast but its differential structure along with the pre-charge cause a slight increase in power consumption.

F06 and F07 are two flip-flops that are very close to one another. The pre-charged sense-amplifier stage is very fast, but the set-reset latch almost doubles the delay due to unequal rise and fall times. This might cause glitches in succeeding logic stages; increasing the power consumption of these stages. F06 has better delay performance but suffers from floating output node of the sense amplifier stage if the data changes during the high phase of the clock, but still it has very low clock load which is an advantage in power consumption. F07 improves on the leakage power consumption.

F08 and F09 are again two single transistor clocked (STC) flip-flops that are very similar. They suffer from substantial voltage drop at the outputs due to the capacitive coupling effect between the common node of the slave latch and the floating output driving node of the master latch. This effect takes place at the rising edge of the clock and causes an increase in delay and short circuit power consumption in the slave latch, which could dominate the dynamic power consumption. The capacitive coupling, floating node and data input signal glitches result in these flip-flops having lower driving capabilities than the rest of the flip-flop circuits used in this paper. This should be taken into account by adding the power consumption of the dummy loads into the power measurements.

F10 is the modified cascode voltage switch logic (CVSL) flip-flop. One of its advantages is using fewer transistors than other flip-flops. No floating nodes but still only one of the output nodes of the input stage can be fully pulled to a weak "0" which might cause more power consumption.

F11 is the modified sense amplifier flip-flop (SAFF). It incorporates a pre-charge sense amplifier and a set and reset latch to hold the data. SAFF's latency is a little higher than other flip-flops due to the delay of one output from the other in the output stage. This drawback is avoided in this modified design, where it supports fully symmetric output transitions.

F12 is the explicitly pulsed flip-flop (EPFF). It consists of a two stage dynamic structure, which has its effect on the power consumption. Noise immunity is another concern with any dynamic design style.

F13 transformed the first stage of the EPFF (F12) to a static stage, reducing its power consumption that is caused by pre-charging, switching and glitching. It also reduced the clock load. The pulse generator could be cause glitches and power consumption in the succeeding circuits. However, a jam-latch (keeper structure) alleviates this concern, though it might require sizing and noise margins characterization.

F14 is the single transistor clocked EPFF. It uses two static latch stages sharing one clock transistor. Pulse width is a very important design parameter for circuits F12, F13, and F14, since it is sensitive to PVT variations and necessary for correct flip-flop functionality.

F15 is the conditionally pre-charged flip-flop (CPFF). Due to the notoriety of dynamic circuits for high power consumption, the CPFF adds conditional logic for the gate to pre-charge; otherwise the pre-charge step is skipped saving its power. It does come with a cost of higher setup time for the conditional logic to evaluate and give an output to the rest of the flip-flop. F15 has the disadvantage of the transparency of the first stage to glitches on the inputs when the output is high.

F16 is the alternative CPFF where the transparency to input glitches is avoided by using an inverter which prevents the propagation of any glitches during the transparency period.

# IV. SIMULATION AND RESULTS

In this paper, all flip-flop circuits were sized for minimum size transistors of a 90nm technology initially, and sized up iteratively for correct functionality. Performance was not a sizing criterion and the idea behind this is that our goal is the lowest power possible, which implies reduction in loading effects. We did see failures at some clock frequencies and that is the only performance sizing effort that was done, improving performance was not one of our goals in this paper. For a general design situation, the inputs were driven with minimum size buffers and the outputs were captured after a minimum size buffer stage as well.

Figure 4 shows the model used for all simulation results presented in this paper. All the circuitry power consumption was included in the measurement of max power, due to the fact that this is the real maximum power that will be consumed if the circuit is used as part of a system. This model is also used to account for the effects of the non-ideal input drivers, the driving capabilities and glitches –if any, of the flip-flop itself and their effect on the outputs. All the numbers and results presented here are from simulations done at 25 degrees Celsius, with a 1.2 volts  $V_{dd}$  power supply and at the target process corner. We simulated all circuits at 10, 25, 50 and 100 MHz. This is done with relative schmooing –sliding– of the data input relative to the clock with equidistant increments which leads to 6 steps for each schmoo at each frequency.



Figure 3 The second set of surveyed flip-flop circuits

This results in 4x6=24 simulations for each flip-flop for gathering worst case delay (Clock-to-Q & D-to-Q) and power. In total there were 16x24=384 simulations to get the results and many more for debugging purposes and sizing iterations.



Figure 4 Simulation setup for flip-flops

Since the focus of this paper is low power, it should be noted that accurate power trends are more important than the exact performance numbers. So, we have used a coarse grain power and performance measurements in this section and a much finer grain set of simulations later in the paper for a more in depth study. So, the lowest power flip-flops from this section were simulated with much higher resolution in Section 5 and the resulting performance numbers are much more accurate and more realistic.

#### A. TIMING AND PERFORMANCE

The charts shown in Figure 5 display the Clock-to-Q (Clk2Q) delay behavior at different Data-to-Clock delay values for different clock frequencies. Also, the charts shown in Figure 6 display the Data-to-Q (D2Q) delay behavior at different Data-to-Clock delay values for different clock frequencies. It is worth noting that Clk2Q results should match the D2Q results in the sense that if we see an increased delay in Clk2Q we should see a corresponding increased D2Q delay. From Clk2Q and D2Q charts, we notice that at 100 MHz flip-flops F03, F10, F13 and F14 have bad delays at 2ns data input delay. We further notice that flip-flops F10, F13 and F14 have bad delays at 4ns as well. These are attributed to the fact that these

flip-flops have high setup time and could be seen from the D2Q charts.

At 50 MHz, we again notice that flip-flops F03, F10, F13 and F14 have bad delays at 4ns data input delay, F03 and F10 still have bad delays at 8ns data input delay and F03 still has bad delays at 12ns data input delay. These again are attributed to the fact that these flip-flops have high setup time and could be seen from the D2Q charts.

At 25 MHz, we notice that flip-flops F04 and F10 have bad delays at 8 and 16ns of data input delay and F04 continues to have bad delays at 24ns data input delay.

At 10 MHz, we notice that flip-flop F10 has bad delays at 20ns and 40ns data input delays. Another note is F02, which has high delay for 2ns but low delays for all other delay values. The outlier behavior of certain flip-flops at certain frequencies can be attributed to specific conditions that may occur when a circuit has internal feedback paths where internal signals are racing with the input or clock to output paths; this shows the importance of characterization.





Figure 6 D2Q simulation results.

| TABLE 1 | NUMBER OF   | CLOCK | NETWORK    | TRANSISTORS    |
|---------|-------------|-------|------------|----------------|
| IADLLI  | INDUMBER OF | CLOCK | TYLET WORK | I KANSISI OKS. |

| F01 | F02 | F03 | F04 | F05 | F06 | F07 | F08 | F09 | F10 | F11 | F12 | F13 | F14 | F15 | F16 |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 10  | 12  | 11  | 9   | 9   | 3   | 3   | 2   | 2   | 5   | 3   | 15  | 14  | 13  | 12  | 12  |

#### B. POWER AND POWER-DELAY-PRODUCT

We chose the peak power consumption to be measured because this is really the parameter to of concern during the design phase of a system. The clock and power delivery networks should both be able of withstand the peak power consumption of the system without any failures. Average power is a good metric for how much power would be used on average, but is dependent on activity, frequency and switching probabilities, which in turn are very dependent on the application. There are two networks in any flip-flop the clock network and the data path network, the number of transistors loading the clock input of a flip flop identify the dependency of the average power consumed on the frequency of operation when using that flip-flop in a design. The average clock power consumption of these flip-flops should be directly related to the number of loading transistors as shown in Table 1, putting in mind that minimal sizing was followed. Another reason not to follow average power in our study is that most designs use clock gating techniques which would normalize all the differences in clock power consumption across circuits, hence rendering such study worthless.

Peak power is not dependent on the clock transition, since it always has to happen in normal operation. Peak power is dependent on the data input and its relative timing with respect to the latching clock edge, not the clock edge itself.

In generic logic circuits the peak power measurement is quite problematic, the reason behind this statement is the difficulty of establishing and qualifying the set of input transitions i.e. vectors and relative timings that cause the circuit to consume most power. In contrast, this is not as bad for flip-flop circuits as the number of inputs is limited and the relative timings are direct forward, i.e. within the clock period of operation. Our experiments showed that the maximum power or peak power caused by the data input is not dependent on frequency and very slightly dependent on the delay of the data input (Data-to-Clock) which actually met our expectations.





From our simulation results (A sample is shown in Figure 7), we notice that flip-flop F03 has a maximum power which peaks above and beyond all the other designs, this is attributed to the structure of the circuit and makes sense when looking back at its performance. We further noticed that some flip-flops are more sensitive than others to delay values; this is due to the structure and internal organization of the flip-flops themselves. Finally, we have noticed that

F05 (K6 ETL) without the reset circuitry is not that impressive power wise.

In addition to the above results, we constructed the power delay product charts for finding out the trade-offs between power consumption and delays for the different flip-flops. A sample of these charts is shown in Figure 8 where the PDP is graphed using (a) D2Q delay and (b) Clk2Q delay.

We noticed from the graphs that if the flip-flop is not in the stable operating region, its delay will dominate the PDP graphs as shown in Figure 8. We might further notice that PDP trends for stable regions of operations and across frequencies is a fair comparison. We also observe that the trending is similar to the maximum power trend.

## V. THE NEW FLIP-FLOP

From Section 4, we can conclude that the worst case power consumption is not dependent on clock frequency or D-to-Q delay unless the setup condition is violated, i.e. the flip-flop changes the region of operation. Moreover, we can conclude that the least power consuming flip-flops are the ones that really deserve to be compared to any new flip-flop, therefore this section focuses on the least peak power consumption flip-flops and compares them to the new flip-flop that we describe next.

The new edge triggered latch (labeled NFF) shown in Figure 9 is a modification of the K6 ETL [3] by replacing the jam-latches and adding the pull down transistors to create cross coupled inverters.



Figure 9 New ETL flip-flop.

Without the pull down transistors (of the back to back inverters) the flip-flop is still functional but the internal zero nodes suffer from cross coupling with the clock signal which causes an increase in the dynamic power consumption and reduction in the noise margins. The output inverters are not needed for correct circuit operation but are placed for general loading situations and to guarantee the internal storage node is not exposed to the output load directly which is a recommended practice for flip-flops and latches.

All the numbers and results presented here are from simulations under the conditions of 25 degrees Celsius, with a 1.2 volts  $V_{dd}$  power supply and at the target process corner. We simulated all circuits at 50MHz. With relative schmooing of the data input relative to the clock with specific increments of setup time (0, 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 3, 3.5, 4, 8, 12, 16, 20ns) which give finer granularity of simulation points at the region of operation change. This enabled the measurement of the worst Clock-to-Q, Data-to-Q delays and power. In total there were 4x15=60 simulations to get the results and more for design, debug purposes, and sizing iterations.

Ideally, for any flip-flop, a designer would like to sweep the clock and data inputs relative to each other through the whole range, which in this case would be a whole clock cycle. Since most of our models and simulators are sample based, which implies a discrete instant of time, the sweeping will have to be at discrete times. This leads to lower accuracy, but again the smaller the sweep increments the higher the accuracy. This point will be illustrated in the simulation results later.

As mentioned before, to simulate each and every flip-flop, we swept the data input edge relative to the latching edge for the edge triggered flip-flop circuits as shown in Figure 10. We did this on multiple iterations to identify which windows are the windows where the flip-flop changes the region of operation. Then we used smaller and finer increments in the windows which need more investigation. As mentioned above, the sweeping for 50MHz was done for a rising and falling data input edge and choosing the worst values.



Figure 10 Simulation method.

In this sub-section we present the delay and power simulation results for the selected flip-flop circuits in comparison to our new proposed flip-flop. As mentioned above we swept the data input relative to the clock and the data to output (D2Q) delay behavior of the flip-flops are shown in Figure 11.



Figure 11 D2Q for flip-flops compared.

The figure shows how the flip-flops follow the curve shown in Figure 1. It is worth noting that in the failure region the output of a typical flip-flop does not follow the input.

The reason for the data points given there is the way we trigger the capture of the delays in HSPICE. The delay at 20ns is identical to the one at 0ns because the event of capturing the delay happens one clock cycle later. The optimal setup time for the new flip-flop would be 1ns, where the D2Q is minimal.



Figure 12 Maximum power for the considered flip-flops.

All other flip-flops exhibit the same behavior with

different corner delays as shown in [2]. The new flip-flop exhibits the typical behavior of flip-flops used for low power applications. By comparing the new flip-flop to the other flip-flops, we can observe some important points. Flip-flop F02 has a better setup time (0.75ns) than the others which are identical (1ns). Again the data points in the failure region are because the latching happened in a later clock cycle. All flip-flops have the same D2Q behavior and are closely comparable.

Figure 12 illustrates the max power (in Watts) consumed in the flip-flops' models for all the setup instants used for sampling. The figure shows that in the failure region the power might be unexpectedly higher than any other operating delay point.

By comparing the new flip-flop to the other flip-flops, we note that flip-flop F02 exhibits unexpectedly high power consumption at its optimal setup delay point. The other flip-flops exhibit the same behavior but the new flip-flop does not. This is very important in showing that there would be no need to tradeoff performance for power using the new flip-flop.

TABLE 2 TRANSISTOR COUNT OF FLIP-FLOP CIRCUITS.

| F01 | F02 | F03 | F04 | F05 | F06 | F07 | F08 | F09 | F10 | F11 | F12 | F13 | F14 | F15 | F16 |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 16  | 24  | 20  | 23  | 23  | 19  | 20  | 18  | 12  | 17  | 28  | 26  | 26  | 27  | 24  | 26  |

## VI. REMARKS AND CONCLUSIONS

On timing and performance, flip-flops F03, F10, F13 and F14 seem to have a higher optimal setup time than the rest of the flip-flops. F04 seems to have a particularly bad performance at 25 MHz which is inherent to the internals of the design itself and dependent on the technology used as well. F02 seems to have a sweet spot at 10 MHz.

On the power-delay front, we have noticed, as mentioned above, that F03 has high peak power consumption than the rest, in the stable performance region of operation. F05, F15 and F16 are next in line. This makes F05 the best of all for high performance systems where the trade-off between power and performance are very obvious. The other flip-flops are comparable regarding power consumption and performance. It is worth mentioning that F02 has the least peak power consumption followed by F01 and F09.

If we would consider the number of transistors as a rough metric of area, given that minimizing the size of transistors was closely adhered to, then Table 2 shows the comparison of the different flip-flops in the area dimension.

From this table, we notice that the best area is F09 and worst is F11. The new flip-flop has 17 transistors, which lies in the middle of the range. This is not a very significant factor, since the transistors are quite small in area, and this area difference effect diminishes in larger designs where flip-flops and latches are a lower percentage of the gate count due to the large combinational logic blocks used to perform the main function needed.

From the above observations and discussions we conclude that it is very important to increase the number of samples where the flip-flops are being simulated to get better accuracy.

We conclude this paper by outlining an important set of guidelines which are the corner-stone for low power flip-flop design methodology and low power flip-flop simulation in general. These are obtained from the lessons learned from all the experiments conducted in this paper. The aim at minimizing the peak and also the average power consumption of the circuit designed.

# Method of design:

a.Minimize number of transistors.

b.Minimize load on clock.

c.Make internal nodes fully driven & not float at any time.

d.Minimize switching including glitching.

e.Remove redundancy except if used to remove glitching or reduce leakage.

f.Minimize size of transistors.

g.Go for all of the above while iterating for design functional sizing.

# Method for simulation:

a.Use a realistic model i.e. proper loading on outputs and non-ideal driving sources on inputs.

b.Use realistic inputs' stimuli to capture the metrics you need to measure.

c.Simulate with coarse granularity to get the best functionality with minimal number of transistors and sizes.

d.Use a small step size in your HSPICE simulation. This helps in getting better accuracy.

e.Go back, analyze and redesign any irregularities in the trends of flip-flop behaviors.

f.Simulate for finer granularity at the corner delay values to gain more insight. This would increase the accuracy dramatically.

The above mentioned guidelines are a set of best known methods; an experienced low-power design engineer would identify with the mentioned rules and would be able to direct his or her design to converge to the design goals (performance, power consumption, or area). It is worthy to note that most of the above items are quite complex and correlated to one another and sometimes need a lot of insight and trial and error iterations to be able to reach these goals.

In summary, low power design for combinational and sequential circuits is an important field and gaining more importance as time goes by and will stay an important area of research for a long time. We have presented a survey and evaluation of low-power flip-flop circuits. Our experimental results enabled us to identify the power and performance trade-offs of existing flip-flop designs. Moreover, we have presented a new flip-flop design and compared it to other competing low-power high performance flip-flop designs. Our experimental results enabled us to establish a set of guidelines for the design of low power and high performance flip-flop circuits.

## ACKNOWLEDGEMENT

We wish to thank Prof. Rajeevan Amirtharajah for his helpful and insightful comments on this paper.

## References

 A. Sayed and H. Al-Asaad, "A new low power high performance flip-flop", *Proc. International Midwest Symposium on Circuits and Systems*, 2006.

- [2] A. Sayed and H. Al-Asaad, "Survey and evaluation of low-power flip-flops", Proc. International Conference on Computer Design (CDES), 2006, pp. 77-83.
- [3] V. Stojanovic and V. G. Oklabdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems,"*IEEE Journal of Solid State Circuits*, Vol. 34, pp. 536-548, April 1999.
- [4] T. Yalcin and N. Ismailoglu, "Design of a fully-static differential low-power CMOS flip-flop", *Proc. International Symposium on Circuits and Systems*, 1999, pp. 331 - 333.
- [5] A. Ghadiri and H. Mahmoodi-Meimand, "Comparative energy and delay of energy recovery and square wave clock flip-flops for high-performance and low-power applications", *Proc. International Conference on Microelectronics*, 2003, pp. 89 - 92.
- [6] P. Zhao, T. Darwish, and M. Bayoumi, "Low power and high speed explicit-pulsed flip-flops", *Proc. Midwest Symposium on Circuits and Systems*, 2002, pp. 477-480.
- [7] N. Nedovic, M. Aleksic, and V. G. Oklabdzija, "Conditional techniques for low power consumption flip-flops," *Proc. International Conference on Electronics, Circuits and Systems*, 2001, pp. 803-806.
- [8] R. H. Katz, *Contemporary Logic Design*, Benjamin/Cummings Publishing Company, Inc. 1994.
- [9] E. J. McCluskey, Logic Design Principles, Prentice Hall, 1986.

[10] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low-power CMOS digital design", *IEEE Journal of Solid-State Circuits*, Vol. 27, pp. 473 - 484, April 1992.

Ahmed T. Sayed is currently the digital design manager at Varkon Semiconductors (asayed@varkonsemi.com, atsayed@ucdavis.edu). He worked as a research scientist for two and a half years at IBM. He taught in several private and public universities. He received his Bachelors from Cairo University, his M.S. from University of Louisiana at Lafayette and his PhD from the University of California, Davis. He worked for Intel Corp. for ten years. He holds one patent with Intel.

Hussain Al-Asaad is currently an associate professor in the department of Electrical and Computer Engineering at the University of California, Davis (halasaad@ece.ucdavis.edu). His research interests include design verification, testing, and fault tolerant computing. Al-Asaad received his B.E. (with distinction) in Computer and Communications Engineering from the American University of Beirut, Lebanon, his M.S. in Computer Engineering from Northeastern University, Boston, and his Ph.D. in Computer Science and Engineering from the University of Michigan, Ann Arbor. Al-Asaad is a recipient of the National Science Foundation CAREER Award. He is a senior member of IEEE.