Video 1 of 4 · ~12 minutes
Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Tapeouts are timing closure. A modern SoC has hundreds of thousands of timing paths; every one must meet setup and hold. Teams of timing engineers spend months ensuring closure at target frequency. A chip that fails timing at the corners ships late, ships slow, or doesn't ship at all. When Intel's Pentium 4 capped at 4 GHz instead of reaching 10 GHz, it was primarily a timing problem (leakage + critical paths too long to hide).
Your Go Board runs at 25 MHz (40 ns period). Every design so far has stayed comfortably under that — we've never shown you a failing timing report. Today changes that. You'll learn to read nextpnr's report, identify critical paths, and apply the three-move playbook when timing fails.
“If I want my design to run at a higher clock rate, I just set a faster clock. The logic is the logic.”
The maximum clock rate is set by the longest combinational path between any two flip-flops. That path has a delay (tclk-to-Q + Σ tLUT + tsetup). Your clock period must exceed this delay. A deep combinational chain = long delay = slow maximum clock. To go faster: shorten the chains by inserting pipeline flops.
clk ────┐ ┌───
└───────────┘
D ────────∎────────────
setup│ │hold
│ │
└──edge
Info: Critical path report for clock 'clk_25mhz' (posedge -> posedge):
Info: curr total
Info: 0.8 0.8 Source u_counter.r_count_reg[5]_DFFR_Q
Info: 1.4 2.2 Net count[5] (fanout = 3)
Info: 1.1 3.3 Source u_alu.u_add.SB_LUT4_I3_O
Info: 1.4 4.7 Net adder_inter[5] (fanout = 2)
Info: 1.1 5.8 Source u_alu.u_mux.SB_LUT4_I0_O
Info: 0.4 6.2 Sink u_out.r_out_reg[5]_DFFR_D (setup)
Info: 6.2 ns delay estimate, frequency 161.3 MHz
r_count[5]'s flop output, wire delay (Net), through an adder LUT, wire, through a mux LUT, to r_out[5]'s flop D input with setup. Total: 6.2 ns. Max frequency: 1/6.2 ns = 161 MHz. At 25 MHz (40 ns period), we have 33.8 ns of slack — plenty.
Info: Critical path report:
Info: 0.8 0.8 Source r_a_reg[0]_DFFR_Q
Info: 1.5 2.3 Net a[0] -> adder input
Info: 1.1 3.4 Source u_add.full_adder_0.SB_LUT4_I1_O
Info: 1.4 4.8 Net carry[1] -> full_adder_1.cin
Info: 1.1 5.9 Source u_add.full_adder_1.SB_LUT4_I1_O
Info: ... (30 more full_adder stages ...)
Info: 1.1 45.3 Source u_add.full_adder_31.SB_LUT4_I1_O
Info: 0.4 45.7 Sink r_sum_reg[31]_DFFR_D (setup)
Info: 45.7 ns delay, frequency 21.9 MHz
ERROR: max frequency for clock 'clk_25mhz' is 21.9 MHz, target 25 MHz
What's wrong, and which fix?
assign sum = a + b and let Yosys infer a tree-adder / use the SB_CARRY fast path.
~5 minutes — make a design fail, then fix it
▸ COMMANDS
cd labs/week3_day10/ex1_timing/
make timing # fails at 25 MHz
# edit adder.v — add pipeline reg
make timing # passes
diff adder_slow.v adder_fast.v
▸ EXPECTED STDOUT
BEFORE:
Fmax: 21.9 MHz — FAIL
AFTER (pipelined):
Fmax: 85.3 MHz — PASS
(1 cycle extra latency)
▸ KEY OBSERVATION
Same inputs, same output. One register added. Fmax went from 21.9 → 85.3 MHz. Latency: one extra cycle. This is the full toolkit: sacrifice latency for throughput when you need to hit a timing target.
| Tool | Does What | Output You Care About |
|---|---|---|
yosys | RTL → gate-level netlist | Cell count (from stat) |
nextpnr-ice40 | Place & route, timing analysis | Fmax (critical path) |
icetime | Static timing analysis of placed design | Path-by-path timing report |
Ask AI: “My 32-bit ripple adder has Fmax 22 MHz, I need 50 MHz. Show me a pipelined version with the minimum number of pipeline stages to meet timing.”
TASK
Ask AI for a pipelined adder.
BEFORE
Predict: 1 pipeline stage ≈ doubles Fmax. So 1 stage gets to ~44 MHz. Need 2 stages for 50 MHz safely.
AFTER
Strong AI picks proper split points. Weak AI just says “add a register” with no breakdown.
TAKEAWAY
Pipelining is a quantitative decision — estimate before implementing.
① Fmax = 1 / (longest combinational path between flops).
② Read nextpnr's critical path report — it names your bottleneck.
③ Three fixes: pipeline, reduce width, reduce fanout.
④ Pipelining trades latency for throughput. Usually a worthy trade.
🔗 Transfer
Video 2 of 4 · ~15 minutes
▸ WHY THIS MATTERS NEXT
Timing tells you “how fast.” Video 2 asks “why so slow?” — and answers with adder architectures, multiplier explosions, fixed-point arithmetic, and the tricks that turn a 32-cycle math operation into a 1-cycle one. By the end you'll know which + the synthesizer built and what you could have built instead.