Inverted Temperature Dependence.

It is known that with increase in temperate, the resistivity of a metal wire(conductor) increases. The reason for this phenomenon is that with increase in temperature, thermal vibrations in lattice increase. This gives rise to increased electron scattering. One can visualize this as electrons colliding with each other more and hence contributing less to the streamline flow needed for the flow of electric current.

There is similar effect that happens in semiconductor and the mobility of primary carrier decreases with increase in temperature. This applies to holes  equally as well as electrons.

But in semiconductors, when the supply voltage of a MOS transistor is reduced, and interesting effect is observed. At lower voltages the delay through the MOS device decreases with increasing temperature, rather than increasing. After all common wisdom is that with increasing temperature the mobility decreases and hence one would have expected reduced current and  subsequently reduced delay. This effect is also referred to as low voltage Inverted Temperature Dependence.
Lets first see, what does the delay of a MOS transistor depend upon, in a simplified model.

Delay = ( Cout * Vdd )/ Id [ approx ]

Cout = Drain Cap
Vdd = Supply voltage
Id = Drain current.

Now lets see what drain current depends upon.

Id = µ(T) * (Vdd – Vth(T))α

µ = mobility
Vth = threshold voltage
α = positive constant ( small number )

One can see that Id is dependent upon both mobility µ and threshold voltage Vth. Let examine the dependence of mobility and threshold voltage upon temperature.

μ(T) = μ(300) ( 300/T )m
Vth(T) = Vth(300) − κ(T − 300)
here ‘300’ is room temperature in kelvin.

Mobility and threshold voltage both decreases with temperature. But decrease in mobility means less drain current and slower device, whereas decrease in threshold voltage means increase in drain current and faster device.

The final drain current is determined by which trend dominates the drain current at a given voltage and temperature pair. At high voltage mobility determines the drain current where as at lower voltages threshold voltage dominates the darin current.

This is the reason, at higher voltages device delay increase with temperature but at lower voltages, device delay increases with temperature.


Synchronous or Asynchronous resets ?

Both synchronous reset and asynchronous reset have advantages and disadvantages and based on their characteristics and the designers needs, one has to choose particular implementation.

Synchronous reset :

Advantages :

– This is the obvious advantage. synchronous reset conforms to synchronous design guidelines hence it ensures your design is 100% synchronous. This may not be a requirement for everyone, but many times it is a requirement that design be 100% synchronous. In such cases, it will be better to go with synchronous reset implementation.

– Protection against spurious glitches. Synchronous reset has to set up to the active clock edge in order to be effective. This provides for protection against accidental glitches as long these glitches don’t happen near the active clock edges. In that sense it is not 100% protection as random glitch could happen near the active clock edge and meet both setup and hold requirements and can cause flops to reset, when they are not expected to be reset.

This type of random glitches are more likely to happen if reset is generated by some internal conditions, which most of the time means reset travels through some combinational logic before it finally gets distributed throughout the system.

Figure : Glitch with synchronous reset

As shown in the figure, x1 and x2 generate (reset)bar. Because of the way x1 and x2 transition during the first clock cycle we get a glitch on reset signal, but because reset is synchronous and because glitch did not happen near the active clock edge, it got filtered and we only saw reset take effect later during the beginning of 4th clock cycle, where it was expected.

– One advantage that is touted for synchronous resets is smaller flops or the area savings. This is really not that much of an advantage. In terms of area savings it is really a wash between synchronous and asynchronous resets.

Synchronous reset flops are smaller as reset is just and-ed outside the flop with data, but you need that extra and gate per flop to accommodate reset. While asynchronous reset flop has to factor reset inside the flop design, where typically one of the last inverters in the feedback loop of the slave device is converted into NAND gate

Figure : Synchronous v/s Asynchronous reset flop comparison.

Disadvantages :

– Wide enough pulse of the reset signal. We saw that being synchronous, reset has to meet the setup to the clock. We saw earlier in the figure that spurious glitches gets filtered in synchronous design, but this very behavior could be a problem. On the flip side when we do intend the reset to work, the reset pulse has to be wide enough such that it meets setup to the active edge of the clock for the all receivers sequentials on the reset distribution network.

– Another major issue with synchronous is clock gating. Designs are increasingly being clock gated to save power. Clock gating is the technique where clock is passed through an and gate with an enable signal, which can turn off the clock toggling when clock is not used thus saving power. This is in direct conflict with reset. When chip powers up, initially the clocks are not active and they could be gated by the clock enable, but right during the power up we need to force the chip into an known set and we need to use reset to achieve that. Synchronous reset will not take into effect unless there is active edge and if clock enable is off, there is no active edge of the clock.

Designer has to carefully account for this situation and design reset and clock enabling strategy which accounts for proper circuit operation.

– Use of tri-state structures. When tri-state devices are used, they need to be disabled at power-up. Because, when inadvertently enabled, tri-state device could crowbar and excessive current could flow through them and it could damage the chip. If tri-state enable is driven by a synchronous reset flop, the flop output could not be low, until the active edge of the clock arrives, and hence there is a potential to turn on tri-state device.

Figure : Tri-state Enable.

Asynchronous reset :

Advantages :

– Faster data path. Asynchronous reset scheme removes that AND gate at the input of the flop, thus saving one stage delay along the data path. When you are pushing the timing limits of the chip. This is very helpful.

– It has obvious advantage of being able to reset flops without the need of a clock. Basically assertion of the reset doesn’t have to setup to clock, it can come anytime and reset the flop. This could be double edged sword as we have seen earlier, but if your design permits the use of asynchronous reset, this could be an advantage.

Disadvantages :

– Biggest issue with asynchronous reset is reset de-assertion edge. Remember that when we refer to reset as ‘asynchronous’, we are referring to only the assertion of reset. You can see in figure about synchronous and asynchronous reset comparison, that one of the way asynchronous reset is implemented is through converting one the feedback loop inverters into NAND gate. You can see that when reset input of the NAND gate, goes low it forces the Q output to be low irrespective of the input of the feedback loop. But as soon as you deassert reset, that NAND gate immediately becomes an inverter and we are back to normal flop, which is susceptible to the setup and hold requirements. Hence de-assertion of the reset could cause flop output to go metastable depending upon the relative timing between de-assertion and the clock edge. This is also called reset recovery time check, which asynchronous reset have to meet even if they are asynchronous ! You don’t have this problem in synchronous reset, as you are explicitly forced to check both setup and hold on reset as well as data, as both are AND-ed and fed to the flop.

– Spurious glitches. With asynchronous reset, unintended glitches will cause circuit to go into reset state. Usually a glitch filter has to be introduced right at the reset input port. Or one may have to switch to synchronous reset.

– If reset is internally generated and is not coming directly from the chip input port, it has to be excluded for DFT purposes. The reason is that, in order for the ATPG test vectors to work correctly, test program has to be able to control all flop inputs, including data, clock and all resets. During the test vector application, we can not have any flop get reset. If reset is coming externally, test program hold it at its inactive value. If master asynchronous reset is coming externally, test program also holds it at inactive state, but if asynchronous reset is generated internally, test program has no control on the final reset output and hence the asynchronous reset net has to be removed for DFT purpose.

One issue that is common to both type of reset is that reset release has to happen within one cycle. If reset release happen in different clock cycles, then different flops will come out of reset in different clock cycles and this will corrupt the state of your circuit. This could very well happen with large reset distribution trees, where by some of receivers are closer to the master distribution point and others could be farther away.

Thus reset tree distribution is non-trivial and almost as important as clock distribution. Although you don’t have to meet skew requirements like clock, but the tree has to guarantee that all its branches are balanced such that the difference between time delay of any two branches is not more than a clock cycle, thus guaranteeing that reset removal will happen within one clock cycle and all flops in the design will come out of reset within one clock cycle, maintaining the coherent state of the design.

To address this problem with asynchronous reset, where it could be more severe, the master asynchronous reset coming off chip, is synchronized using a synchronizer, the synchronizer essentially converts asynchronous reset to be more like synchronous reset and it becomes the master distributor of the reset ( head of reset tree). By clocking this synchronizer with the clock similar to the clock for the flops( last stage clock in clock distribution), we can minimize the risk of reset tree distribution not happening within one clock.


Wire Delay Modeling for Static Timing Analysis.

[Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked VLSI Interview quetsions]

In previous post we saw how we use a look-up table based simple modelling of devices for getting timing-arcs or delay-arcs through the devices. These arcs are nothing but input-output pin pair delay values. Now we can look at how wire delays are modeled in STA.

In spice modelling, again wires are represented through a complex mathematical function based models are used ( Think matrices, nodal analysis ), which accurately models wires behavior and you can see waveform propagation happening through wires based on the characteristics of the wires and one can measure delays through probing the waveforms in question during the dynamic timing analysis.

Similar to devices, idea is to come up with a simplistic modelling of wires which can give us reasonably accurate delay values through the wires, for static timing analysis. There are several models available for wires, but one such model ‘Elmore delay’ has been widely used in industry because of its simplicity and relative accuracy. We’ll talk more about this Elmore delay model here.

If you want to find out wire delay from point ‘a’ to point ‘c’, we can get it using following formula.
RC(ac)= R1C1 + (R1+R2)C2.
If there are N stages, general formula is like following.
RC(an)= R1C1 + (R1+R2)C2 + (R1+R2+R3)C3 + … + (R1+R2+R3+….+RN)CN.

Hope you get the idea.

In practice any segment of wire is broken into small pieces, like the R1, R2.. RN in the picture and Elmore delay formula is used to come up with the real wire delay numbers. Here I’ve explained very simplified concept of the wiredelay. Later on we’ll discuss more about various complexities that we’ve to deal with when we want to accurately come up with wire delays.


STA Basics

[Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked Static Timing Analysis Interview quetsions]

In timing analysis we are interested in mainly delay through standard gates and slopes or transition values of signals at various nodes in circuit. These delays and slopes dictate what speed your circuit will run without any functional issues.STA stands for Static Timing Analysis. Static as opposed to dynamic timing analysis. One can choose to do either static or dynamic timing analysis. So what is the difference ?

Main difference between two methods lies in the way circuit components are modeled. Dynamic timing analysis is done through dynamic simulation of the circuits. Spice simulation that you might be familiar with. While static timing analysis is done through a look-up table method, no dynamic simulation is performed.

For standard cells(logic gates), delay through the cell primarily depends on three factors: Strength of the cell, input signal slope and output pin load.

For dynamic analysis spice simulation of the whole circuit in question is carried out. In spice simulation circuit devices are modeled through mathematical functions. Think of matrices and nodal analysis. To state in simple terms, spice model has mathematical functions of various parameters of the device.

A standard cell is modeled through math functions where by delay through the cell is represented as a function of the input signal slope, output pin load and the strength of the cell in question.

If you look at the following simple figure of a buffer, it’ll be provided input wave from with certain slope and on output a waveform with certain slope will be observed. The delay of the cells is measured through these wave forms at 50% rise/fall time as shown in the picture.

Delay through buffer B with input & output waveforms

While in static timing analysis, we don’t simulate, but the device behavior is represented as a table and based on the actual input parameter values we just look up the table to find delay through the cell.
Lets say the size of buffer in above figure is B and for this B size we’ve following extensively simplified version of the table.
Input Slope Output Load Delay
0.05ns 0.2pf 0.05ns
0.1ns 0.2pf 0.075ns
As it was mentioned previously the delay through the cell depends on the input waveform slope and output pin load. When the timing analysis is done, based on the current values of input slope and output load we simply pick the correct entry from the table there we’ve delay number through the cell. That simple !

Of course this is simplistic analysis to give you a very high level idea of the difference between static v/s dynamic timing analysis.

So what about wires in the circuit ? That we’ll cover in next post.

Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked Static Timing Analysis Interview quetsions

Static Timing Analysis 103 : Hold Failures.

Let’s look at the hold failure or the min delay constraint in depth here. Like setup, there is a ‘Hold’ requirement for each sequential element (flop or a latch). That requirement dictates that after the assertion of the active/capturing edge of the sequential element input data needs to be stable for a certain time/window.

If input data changes within this hold requirement time/window, output of the sequential element could go meta-stable or output could capture unintentional input data. Therefor it is very crucial that input data be held till hold requirement time is met for the sequential in question.

In our figure below, data at input pin ‘In’ of the first flop is meeting setup and is correctly captured by first flop. Output of first flop ‘FF1_out’ happens to be inverted version of input ‘In’.

As you can see once the active edge of the clock for the first flop happens, which is rising edge here, after a certain clock to out delay output FF1_out falls. Now for sake of our understanding assume that combinational delay from FF1_out to FF2_in is very very small and signal goes blazing fast from FF1_out to FF2_in as shown in the figure below.

In real life this could happen because of several reasons, it could happen by design (imagine no device between first and second flop and just small wire, even better think of both flops abutting each-other ), it could be because of device variation and you could end up with very very fast device/devices along the signal path, there could be capacitance coupling happening with adjacent wires, favoring the transitions along the FF1_out to FF2_in, node adjacent to FF2_in might be transitioning high to low( fall ) with a sharp slew rate or slope which couples favorably with FF2_in going down and speeds up FF2_in fall delay.

In short in reality there are several reasons for device delay to speed up along the signal propagation path. Now what ends up happening because of fast data is that FF2_in transitions within the hold time requirement window of flop clocked by clk2 and essentially violates the hold requirement for clk2 flop.

This causes the the falling transition of FF2_in to be captured in first clk2 cycle where as design intention was to capture falling transition of FF2_in in second cycle of clk2. It looks as if FF2_in falling transition races through the first clk2 cycle, whereas intention was for FF2_in fall to be captured during the next clock cycle of clk2. This is the reason, hold or min failures are often referred to as races, as data is essentially racing through.

As you can see in the waveform, the ‘Out’ node actually captures the low value of FF2_in, although the intention was to capture the high value. What is not shown in the figure is that, when hold time is violated, the ‘Out’ node is most likely to go meta-stable( you can refer to previous post about setup failure) and get into an pseudo stable state where it’s swinging between and high and low rails for a while before settling to binary high or low value. It could take a really long time( more than a cycle !) for the flop output node to settle. Consequences are similar in the sense that down stream logic could sample false data and one would have functional corruption.

In a normal synchronous design where you have series of flip-flops clocked by a grid clock(clock shown in figure below) intention is that in first clock cycle for clk1 & clk2, FF1_out transitions and there would be enough delay from FF1_out to FF2_in such that one would ideally have met hold requirement for the first clock cycle of clk2 at second flop and FF2_in would meet setup before the second clock cycle of clk2 and when second clock cycle starts, at the active edge of clk2 original transition of FF1_out is propagated to Out.

Now if you notice there is skew between clk1 and clk2, the skew is making clk2 edge come later than the clk1 edge ( ideally we expect clk1 & clk2 to be aligned perfectly, that’s ideally !! ).

In our example this is exacerbating the hold issue, if both clocks were perfectly aligned, FF2_in fall could have happened later and would have met hold requirement for the clk2 flop and we wouldn’t have captured wrong data !! Your comments and questions are welcome.

Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked VLSI Interview quetsions


Static Timing Analysis 102 : Setup Failures.

In last post we looked at general signal propagation from one flop to another and we talked at higher level of max delay or setup constraint. Let’s look at max delay constraint in more depth so it is very clear before we move onto min delay or hold constraint.

Following figure describes visually a setup failure. As you can see that first flop releases the data at the active edge of clock, which happens to be the rising edge of the clock. FF1_out falls sometime after the clk1 rises.

The delay from the clock rising to the data changing at output pin is commonly referred to as clock to out delay. There is finite delay from FF1_out to FF2_in through some combinational logic for the signal to travel. After this delay signal arrives at second flop and FF2_in falls. Because of large delay from FF1_out to FF2_in, FF2_in falls after the setup requirement of second flop, indicated by the orange/red vertical dotted line.

This means input signal to second flop FF2_in, is not held stable for setup time requirement of the flop and hence this flop goes meta-stable and takes a long time before settling to the correct value. In theory it could take upto one clock cycle or even more than a clock cycle to resolve the meta-stability, which means flop output could be in the unknown state for one full clock cycle or even more. This means down stream logic( down stream flop or latches) capture wrong data and your state machine is corrupted.

As you can see one would’ve expected ‘Out’ node to go low but it doesn’t because of setup time or max delay failure at the input of the second flop. Setup time requirement dictates that input signal be steady during the setup window ( which is a certain time before the clock capture edge ).

Another possibility is that if input FF2_in, doesn’t fall(change) near the closing edge of the flop, but because of very large delays from FF1_out, it just falls much later than the second rising edge of the clock, in which case, the second flop wouldn’t go meta-stable, but it will just capture ‘1’ to the input, which would be a wrong value to be captured and could cause downstream functional failure.

As mentioned earlier if we reduce frequency, our cycle time increases and eventually FF2_in will be able to make it in time and there will not be a setup failure. Also notice that a clock skew is observed at the second flop.

The clock to second flop clk2 is not aligned with clk1 anymore and it arrives earlier, which exacerbates the setup failure. This is a real world situation where clock to all receivers will not arrival at same time and designer will have to account for the clock skew. We’ll talk separately about clock skew in details in later posts.

Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked VLSI Interview quetsions


Static Timing Analysis 101

[Looking for interview questions? Preparing for the interview in STA/Physical design? Get most frequently asked Timing analysis interview quetsions]


Getting a solid grasp of static timing analysis in digital circuits can be a very big challenge. It usually takes quite a bit of experience. In my view it’s not covered very well in school, and because of that it’s more difficult for new engineers.
I’ll attempt to cover this topic in depth here. We’ll start with most simplest case, which is a basic flip-flop timing path.

In general digital circuits have two types of elements,
1) combinational elements which include logic gates like, NAND gate, NOR gate, COMPLEX gate etc..
2) storage elements or memory elements, which are either flip-flops or latches. Flip-flops, which will be abbreviated as flops, are edge triggered memory elements, while latches are level sensitive memory elements.

Following is a simple structure where output of a flop goes through some stages of combinational logic, represented by pink bubble and is eventually samples by receiving flop. Receiving flop, which samples the FF2_in data, poses timing requirements on the input data signal.

The logic between FF1_out to FF2_in should be such that signal transitions could propagate through this logic fast enough to be captured by the receiving flop. For a flop to correctly capture input data, the input data to flop has to arrive and become stable for some period of time before the capture clock edge at the flop.

This requirement is called the setup time of the flop. Usually you’ll run into setup time issues when there is too much logic in between two flop or the combinational delay is too small. Hence this is sometimes called max delay or slow delay timing issue and the constraints is called max delay constraint.

In figure there is max delay constraint on FF2_in input at receiving flop. Now you can realize that max delay or slow delay constraint is frequency dependent. If you are failing setup to a flop and if you slow down the clock frequency, your clock cycle time increases, hence you’ve larger time for your slow signal transitions to propagate through and you’ll now meet setup requirements.

Typically your digital circuit is run at certain frequency which sets your max delay constraints. Amount of time the signal falls short to meet the setup time is called setup or max, slack or margin. There is flip side to max delay constraint which is min or fast delay constraint.

It more interesting and difficult to understand hence will cover in next post. Following figure shows the data propagation from input to the output of the flop and through combinational delay to the input of next flop.

Be aware that data FF1_out potentially changes( if ‘in’ chnages ) on the active/capture edge of the flop, in following case we’ve assumed flops to be rising edge triggered.

Looking for interview questions? Preparing for interview in VLSI field ? Get most frequently asked VLSI Interview quetsions