# 12. Dynamic CMOS Logic

#### Jacob Abraham

#### Department of Electrical and Computer Engineering The University of Texas at Austin

VLSI Design Fall 2020

October 8, 2020

- Measure all hold times with respect to the main clock
- Adjust the hold time if the flop is receiving a delayed clock
- Compute the shortest path delay from the rising edge of the clock
- Check to see if there are any hold time failures

### **Example:** Fixing Hold-Time Violations



Shortest path delay from A  $\rightarrow$  E = 100 + 40 + 40 + 20 = 200 ps Delay between CLK1 and CLK = 50 ps Adjusted hold time = 200 + 50 = 250 ps Hold Slack = (Path Delay) - (Adjusted Hold Time) = 200 - 250 = -50 ps  $\implies$  FAIL (Hold slack should be  $\geq$  0)

## Example: Fixing Hold-Time Violations, Cont'd



**Insert 4 inverters after D**, with each adding a 20 ps (or can insert one AND gate)

Long path (4 invs.) = 100 + 50 + 50 + 20 + 80 + 10 = 310 ps Now the minimum cycle time at which the path can operate = (Path Delay) - (CLK  $\rightarrow$  CLK1 Delay) = 310 - 50 = 260 ps

If possible, add the additional delay to fix hold time violations in the short path (without affecting the long paths)

ECE Department, University of Texas at Austin

Lecture 12. Dynamic CMOS Logic

## Dynamic Logic

- Dynamic gates use a clocked pMOS pullup
- Two modes of operation: precharge and evaluate



ECE Department, University of Texas at Austin

#### The "Foot" Transistor

- What if pulldown network is ON during precharge?
- Use series evaluation transistor to prevent fight between pMOS and nMOS transistors





- Dynamic gates require monotonically rising inputs during evaluation
  - $0 \rightarrow 0$
  - 0 → 1
  - 1 → 1
  - But not  $1 \rightarrow 0$





#### Monotonicity Woes

- But dynamic gates produce monotonically falling outputs during evaluation
- Illegal for one dynamic gate to drive another!



#### Domino Gates

#### • Follow dynamic stage with inverting static gate

- Dynamic/static pair is called domino gate
- Produces monotonic outputs



### **Domino Optimizations**

- Each domino gate triggers next one, like a string of dominos toppling over
- Gates evaluate sequentially, precharge in parallel
- Evaluation is more critical than precharge
- HI-skewed static stages can perform logic



### **Dual-Rail Domino**

- Domino only performs noninverting functions:
  - AND, OR but not NAND, NOR, or XOR
- Dual-rail domino solves this problem
  - Takes true and complementary inputs
  - Produces true and complementary outputs

| sig_h | sig₋l | Meaning    |
|-------|-------|------------|
| 0     | 0     | Precharged |
| 0     | 1     | '0'        |
| 1     | 0     | '1'        |
| 1     | 1     | Invalid    |



#### Example: AND/NAND

- Given A\_h, A\_l, B\_h, B\_l
- Compute  $Y_h = A * B$ ,  $Y_l = \sim (A * B)$
- Pulldown networks are conduction complements



#### Example: XOR/XNOR

- Sometimes possible to share transistors
  - Sharing works well in implementations of symmetric functions
  - See papers on "relay logic" published over 50 years ago



## Leakage

- Dynamic node floats high during evaluation
  - Transistors are leaky  $(I_{off} \neq 0)$
  - Dynamic value will leak away over time
  - Formerly milliseconds, now nanoseconds!
- Use keeper to hold dynamic node
  - Must be weak enough not to fight evaluation
- Leakage Power!



## Charge Sharing

• Dynamic gates suffer from charge sharing



$$V_x = V_y = \frac{C_y}{C_x + C_y} V_{DD}$$

## Secondary Precharge

- Solution: add secondary precharge transistors
  - Typically need to precharge every other node
- Big load capacitance on Y helps as well



## Noise Sensitivity

- Dynamic gates are very sensitive to noise
  - Inputs:  $V_{IH} \approx V_{tn}$
  - Outputs: floating output susceptible noise
- Noise sources
  - Capacitive crosstalk
  - Charge sharing
  - Power supply noise
  - Feedthrough noise
  - And more!

Chip power supply voltage map

when executing a program





Lecture 12. Dynamic CMOS Logic

#### Alternating N & P Domino Logic



## Cascade Voltage Switch Logic (CVSL)



### Dynamic CVSL XOR Gate



### Dual-Rail Domino Full Adder Design

- Very fast, but large and power hungry
- Used in very fast multipliers



- Domino logic is attractive for high-speed circuits
  - 1.5 2x faster than static CMOS
- Many Challenges
  - Monotonicity
  - Leakage
  - Charge sharing
  - Noise
- Used in previous generation high-performance microprocessors and in some recent embedded processors

## Domino Logic in Current Designs

- Domino design from Intrinsity used in 1-GHz 0.75W ARM Cortex A8 from Samsung (Intrinsity later acquired by Apple)
- Fast Domino (called "Fast14 NDL") gates are inserted selectively into critical speed paths, with custom SRAMs and optimized synthesized logic elsewhere
- Standard power saving techniques are also used
- Domino gates are clocked by multiphase clocks
- A type of "super-pipeline" where the domino footers form the barrier for the pipeline operation



(Source: Electronic Design - Embedded, August 29, 2009)

# Intrinsity OR/NOR Implementation with "N-nary Logic"

2-bit function using 1-out-of-4 signals



Ref: U. S. Patent 6066965, Method and apparatus for a N-nary Logic Circuit Using 1 of 4 Signals

ECE Department, University of Texas at Austin

Lecture 12. Dynamic CMOS Logic

## Intrinsity XOR/Equivalence Implementation

Using 1-out-of-2 signals



Ref: U. S. Patent 6066965, Method and apparatus for a N-nary Logic Circuit Using 1 of 4 Signals

ECE Department, University of Texas at Austin

Lecture 12. Dynamic CMOS Logic