Day 4 · Sequential Logic Fundamentals

Nonblocking Assignment

Video 2 of 4 · ~12 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

Nonblocking assignment is the basis of every synchronous pipeline in existence — every CPU, every networking switch, every GPU. When Intel simulates a Xeon CPU, every flop in the design uses <=. When TSMC verifies their standard-cell library, every registered output uses <=.

In This Course

Day 5 shift registers live or die by <=. Day 7 FSM state transitions demand <=. Your Day 11 UART framing pipeline is 6 <= deep. The rule is absolute — today's job is to build deep intuition via waveforms.

The simulator scheduling semantics that make <= work are standardized in IEEE 1364 / 1800 — it's not a quirk, it's the core of synchronous RTL semantics.

⚠️ The Assignments Don't Actually Happen Yet

❌ Wrong Model

“<= is just assignment. Maybe with a slight delay? The variables get their new values as the lines execute.”

✓ Right Model

<= is not assignment at all — it's scheduling. Each <= line says “at the end of this timestep, please update the LHS to this value.” All the RHS expressions are evaluated using pre-edge values. Only after the whole block finishes do the updates apply — all at once, atomically.

The receipt: This is exactly how real flip-flops behave. Every flop in your design captures its D input at the same physical moment (the clock edge). No flop sees another flop's new value until after the edge. Nonblocking assignment models this precisely.

The key insight: <= is scheduling semantics, not assignment. The simulator has a two-phase update model for each timestep — evaluate everything, then commit. This is the digital equivalent of database transaction isolation.

👁️ I Do — Nonblocking: How It Works

Step 1: On the clock edge, evaluate all RHS expressions using current (pre-edge) values.

Step 2: Schedule all updates. Nothing has changed yet.

Step 3: Apply all scheduled updates simultaneously at the end of the timestep.

This is exactly what real hardware does: all flip-flops capture their D inputs at the same physical instant. No flop “sees” another flop's new value until after the edge.

🤝 We Do — The Two-Stage Pipeline

// WRONG: blocking
always @(posedge clk) begin
    b = a;    // b gets a immediately
    c = b;    // c gets the NEW b (= a!)
end
// Result: b=a, c=a — no pipeline at all

// CORRECT: nonblocking
always @(posedge clk) begin
    b <= a;   // scheduled: b ← a(current)
    c <= b;   // scheduled: c ← b(current)
end
// Result: b=a(old), c=b(old) — proper 2-stage pipeline

🧪 You Do — Trace Four Cycles

Given a toggles 1,0,1,0,1,… each cycle. Starting values: b=c=0. Trace b, c for 4 edges using nonblocking:

always @(posedge clk) begin
    b <= a;
    c <= b;
end

Answer: cycle 1: a=1 → b=1, c=0 cycle 2: a=0 → b=0, c=1 cycle 3: a=1 → b=1, c=0 cycle 4: a=0 → b=0, c=1 Notice: c is a delayed by 2 cycles. That's the pipeline signature.

With blocking: b=a, then c=b=a every cycle — c matches a with zero delay. Pipeline collapsed.

Two-cycle delay is the hallmark of a correctly-operating 2-stage pipeline. If your waveform shows c tracking a with zero delay, you've hit the blocking bug.

▶ LIVE DEMO

Blocking vs Nonblocking Side-by-Side

~5 minutes

▸ COMMANDS

cd labs/week1_day04/ex2_pipeline_demo/
# Two modules: pipe_blocking, pipe_nonblocking
make sim
make wave   # loads saved .gtkw

▸ EXPECTED STDOUT

BLOCKING:
  t=10 a=1 b=1 c=1
  t=20 a=0 b=0 c=0
  (b and c always match a)

NONBLOCKING:
  t=10 a=1 b=1 c=0
  t=20 a=0 b=0 c=1
  (c lags by 1 cycle)

▸ GTKWAVE

Two traces stacked. Blocking: a, b, c all change together on each edge (pipeline collapsed). Nonblocking: staircase pattern — a leads, b follows by 1, c follows by 2. This is the pipeline visible.

🔧 What Did the Tool Build?

Blocking

$ yosys ... blocking.v
SB_DFF:      1    ← ONE flop
SB_LUT4:     0

(synthesizer proves b==c
 and optimizes away.
 Result: input → 1 flop → output)

Nonblocking

$ yosys ... nonblocking.v
SB_DFF:      2    ← TWO flops
SB_LUT4:     0

(proper 2-stage pipeline)

Hardware doesn't lie. The blocking version synthesizes to fewer flops because the synthesizer correctly sees that stages are redundant. Your 4-deep pipeline became a 1-stage wire with latency. Timing-closure nightmare downstream.

🤖 Check the Machine

Ask AI: “Explain, using the term 'active event queue', why nonblocking assignments prevent race conditions in sequential Verilog.”

TASK

Ask AI about simulator event queue semantics.

BEFORE

Predict: NBA events go to an NBA region that fires after the active region, ensuring atomic commit.

AFTER

Strong AI explains the active→NBA→inactive regions. Weak AI just says “it's delayed.”

TAKEAWAY

This is IEEE 1364 §5 material. The reference for anyone who wants deep scheduling knowledge.

The Golden Rules

= in always @(*) — combinational

<= in always @(posedge clk) — sequential

Never mix. Never break this rule.

Key Takeaways

① <= evaluates all RHS first, then updates simultaneously.

② This models real flip-flop behavior — simultaneous capture.

③ = in sequential blocks destroys pipeline behavior.

④ The rule is absolute: = for @(*), <= for @(posedge).

Nonblocking is not delayed assignment. It's simulated simultaneity.

🔗 Transfer

Flip-Flop Variants

Video 3 of 4 · ~10 minutes

▸ WHY THIS MATTERS NEXT

The bare D-flop is rare in practice. Real designs have reset (to initialize state), enable (to conditionally update), and choices about synchronous vs asynchronous reset. Video 3 covers the patterns you'll see in 99% of production RTL.