Day 5 · Counters, Shifters & Sync

Metastability & Synchronizers

Video 3 of 4 · ~12 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

Counter VariationsShift RegistersMetastabilityDebouncing

🌍 Where This Lives

In Industry

Mars Pathfinder. The 1986 Therac-25. Multiple Xilinx errata. Intel's Pentium FDIV bug had asynchronous cousins. Real silicon has failed in the real world from inadequate synchronization. Every FPGA/ASIC design guide from Intel, Xilinx, and Lattice devotes a chapter to this. Every code review at a senior RTL shop checks for missing synchronizers on asynchronous inputs.

In This Course

Day 5 button debouncer needs synchronizers (next video). Day 11 UART RX needs synchronizers on the incoming data line. Day 12 SPI CS needs one. Every external signal into every module for the rest of the course needs this pattern.

Career tag: “Explain metastability” is the most-asked FPGA/ASIC interview question. An engineer who can't answer it doesn't get the job.

⚠️ Real Silicon Doesn't Care About Your Simulator

❌ Wrong Model

“Flip-flops output 0 or 1 on every clock edge. My testbench always shows clean transitions. Asynchronous inputs work fine in simulation.”

✓ Right Model

Real flip-flops have setup and hold time windows around the clock edge. If an input changes during that window, the flop's output goes into a metastable state — neither 0 nor 1 — and may oscillate, decay slowly, or resolve randomly. Simulators don't model this. Your waveform will be clean while your real chip glitches once a week.

The receipt: Metastability causes physical failures that simulation never catches. Silicon datasheets quote MTBF (mean time between failures) in the 10−9 to 10−12 range for unsynchronized asynchronous inputs. That's ~1 failure per week at 25 MHz without mitigation.

The Setup/Hold Window

           setup         hold
            ←──►         ←──►
  clk  ─────────┐       ┌─────────
                └───────┘
  D    ─────────∎─(must be stable in windows)─∎─────
                ∎ METASTABLE ZONE if D changes here
    

If D changes during the setup/hold window, the flop's output becomes metastable — stuck between 0 and 1 — until it resolves (probabilistically).

iCE40 specs: tsetup ≈ 0.4 ns, thold ≈ 0.3 ns. Miss that 0.7 ns window on a 25 MHz asynchronous input: you'll hit it eventually.

👁️ I Do — The 2-FF Synchronizer

module sync_2ff (
    input  wire i_clk,
    input  wire i_async,     // from external world, any timing
    output wire o_synced     // clean, synchronous to i_clk
);
    reg r_meta, r_stable;
    always @(posedge i_clk) begin
        r_meta   <= i_async;   // first flop: may go metastable
        r_stable <= r_meta;     // second flop: has a full clock period to resolve
    end
    assign o_synced = r_stable;
endmodule
My thinking: The first flop may go metastable — but metastability resolves exponentially (probabilistic decay). Given a full clock period (~40 ns on iCE40) to settle, the probability it's still metastable at the second flop's capture edge is astronomically small. At 25 MHz with iCE40's τ, MTBF is measured in centuries.

🤝 We Do — When To Synchronize

Does this input need a synchronizer?

  1. A push button wired to an FPGA pin
  2. The output of a Day 4 counter feeding a comparator
  3. UART RX line from a USB-serial chip
  4. A register's output feeding an ALU
  5. A signal from another FPGA running on a different crystal
Answers: (1) YES — human input, no clock relationship. (2) NO — already synchronous to your clock. (3) YES — external chip, different clock domain. (4) NO — internal signal. (5) YES — different crystal = different clock domain = async by definition.
The rule is absolute: Every input that crosses from one clock domain into another (including “from outside the chip”) needs a synchronizer. No exceptions.

🧪 You Do — Spot the Bug

module reader (
    input wire clk, async_ready, async_data,
    output reg captured
);
    always @(posedge clk)
        if (async_ready) captured <= async_data;
endmodule

Find the metastability bugs.

Answer: Both inputs (async_ready and async_data) are asynchronous and feed flops directly. Need synchronizers on each. Worse: using async_ready as both a clock-enable and a mux select creates a risk of capturing async_data mid-transition even if it were synchronized. Fix: synchronize both signals through 2-FF synchronizers first, then use them.
▶ LIVE DEMO

Synchronizer on an Asynchronous Button

~4 minutes

▸ COMMANDS

cd labs/week2_day05/ex3_sync_2ff/
make sim
make wave
make stat

▸ EXPECTED STDOUT

PASS: synced follows async
      with 2-cycle latency
PASS: glitches on async do
      not propagate
=== 10 passed, 0 failed ===

  SB_DFF: 2

▸ GTKWAVE

Signals: i_async · r_meta · r_stable. Note: r_meta may glitch in simulation (if the TB drives pulses shorter than the clock period); r_stable is always clean. Latency is 2 clock cycles. That's the cost of safety.

🔧 What Did the Tool Build?

$ yosys -p "read_verilog sync_2ff.v; synth_ice40 -top sync_2ff; stat" -q

=== sync_2ff ===
   Number of wires:                  4
   Number of cells:                  2
     SB_DFF                          2    ← exactly 2 flops, no frills
     SB_LUT4                         0
Cost of safety: 2 flops per synchronized signal. On iCE40 HX1K: 0.16% of the chip per synchronizer. You can afford to synchronize every external signal — there is no reason not to.
Advanced note: Some FPGA flows add ASYNC_REG or similar synthesis attributes to ensure these two flops are kept close in silicon. For iCE40, Yosys handles placement automatically. For Xilinx, you'd add (* ASYNC_REG="TRUE" *) attributes.

🤖 Check the Machine

Ask AI: “I'm reading a button press directly into my Verilog state machine. Do I need a synchronizer? Calculate MTBF with and without one for a 25 MHz clock.”

TASK

Ask AI about sync + MTBF.

BEFORE

Predict: without sync, MTBF ~hours-days. With 2FF sync, MTBF ~centuries.

AFTER

Strong AI shows the MTBF formula. Weak AI handwaves “it's fine” — dangerous advice.

TAKEAWAY

Any AI that says “buttons don't need synchronizers” is wrong. Don't trust that model for RTL work.

MTBF formula: MTBF = exp(Tresolve / τ) / (Twindow · Fclock · Fasync). For iCE40 (τ≈200 ps), T=40 ns, F_clk=25 MHz, F_async=10 Hz button: MTBF ≈ 1015 years with sync, ~1 day without.

Key Takeaways

 Asynchronous signals + setup/hold violations = metastability.

 The 2-FF synchronizer gives metastability time to resolve.

 Every external input and clock-domain crossing needs one.

 Cost: 2 flops, 2 cycles latency. Cheap insurance.

Every external input gets 2 flops. Every time. No exceptions.

🔗 Transfer

Button Debouncing

Video 4 of 4 · ~10 minutes

▸ WHY THIS MATTERS NEXT

Synchronization handles metastability — but buttons have a second problem: they bounce mechanically for up to 20 ms. At 25 MHz that's 500,000 false edges per press. Video 4 combines the synchronizer you just saw with a counter-based debouncer to build the complete input pipeline you'll use everywhere.