Day 12 · UART RX · SPI · Integration

RX Oversampling

Video 1 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

16× oversampling is the standard across every UART implementation since the original 8250 UART chip (1970s). Intel's 8250, 16550A, and descendants all use 16×. FTDI USB-UART chips use 16×. ARM's PL011 UART uses 16×. Your FPGA vendor IP uses 16×. Once you learn this technique, you can reverse-engineer any UART RX in under 10 minutes. Also: 16× is the foundation of every clock-recovery scheme in modems, SerDes, and high-speed links.

In This Course

Day 11 TX was straightforward — you controlled the timing. Today's RX is the hard part, and the “16× oversampling” trick is what makes it work. Once you have RX working (Video 2), you have full-duplex UART on your Go Board. Day 12 Video 3 introduces SPI; Video 4 integrates everything.

⚠️ Receive Timing Is Not Your Timing

❌ Wrong Model

“For RX, I'll just reverse the TX: count CLKS_PER_BIT cycles, sample the line, move to the next bit.”

✓ Right Model

You don't know when the byte arrives. The transmitter's clock and yours drift independently up to ±2%. If you sample at CLKS_PER_BIT intervals starting from the detected start edge, by bit 8 your sample point has drifted well into the next bit. You need to sample in the middle of each bit, not at the edges, to tolerate drift. 16× oversampling gives you the machinery to do this.

The receipt: Without oversampling, ±2% clock mismatch gives you ~2% sampling error per bit × 10 bits = 20% drift by end of byte. You'll sample bit 8's value during bit 9's time. Byte decoded wrong.

👁️ I Do — The 16× Oversampling Scheme

   One bit time (CLKS_PER_BIT cycles)
   ├─────────────────────────────────────┤
    │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
    0 1 2 3 4 5 6 7 8 9 A B C D E F        ← 16 oversample slots

              ▲         ▲         
              │         │
            start      mid-bit sample point
            edge       (slot 8)   → this is the value

My thinking: Divide each bit time into 16 oversample slots. Trigger on the start-bit falling edge. From that edge, count 8 slots forward (middle of start bit) to verify it's still LOW. Then count 16 slots to reach the middle of bit 0. 16 more to middle of bit 1. And so on. Each bit is sampled at its midpoint, where drift is least likely to corrupt it.

Math: CLKS_PER_OSX = CLKS_PER_BIT / 16. At 25 MHz/115200 baud: 217/16 = 13.6 cycles per oversample slot. The FPGA counter counts up to 13 or 14 cycles per oversample.

🤝 We Do — Start-Bit Detection

// 1. Synchronize the async RX line (2-FF sync — Day 5 lesson returns!)
reg r_rx_sync1, r_rx_sync2, r_rx_sync3;
always @(posedge i_clk) begin
    r_rx_sync1 <= i_rx;      // metastability flop 1
    r_rx_sync2 <= r_rx_sync1; // stable
    r_rx_sync3 <= r_rx_sync2; // 1 cycle earlier — for edge detect
end
wire falling_edge = r_rx_sync3 & ~r_rx_sync2;   // 1 → 0 transition

Together: First step: synchronize i_rx (external, async) through 2 flops. Third flop is for edge detection. falling_edge pulses for 1 cycle when the line transitions from idle (1) to start-bit (0). The RX FSM uses this to leave the IDLE state and begin framing. Then the oversampling logic counts to slot 8 to verify it wasn't a glitch.

🧪 You Do — Why 16 And Not 4 Or 64?

Why does UART RX use 16× oversampling? Why not 4× (cheaper) or 64× (more precise)?

Tradeoff analysis:

4×: Middle of bit = slot 2. Only 1 sample on each side of true middle → poor tolerance to clock drift. Fails at ±1% mismatch.
8×: Better, but still sensitive to start-bit detection jitter.
16×: Middle at slot 8. ±7 slots (~43% of bit time) of drift tolerance. Standard since 1970s.
64×: Excellent tolerance but 4× the counter width, 4× the clock speed requirement. Diminishing returns for UART.

Answer: 16× is the sweet spot for UART's ±2% tolerance requirement. Modems and SerDes use higher oversampling (8-128×) because their data rates leave less room for error.

▶ LIVE DEMO

Oversampling Decision Visualization

~4 minutes

▸ COMMANDS

cd labs/week3_day12/ex1_oversample/
python3 plot_sampling.py
# Simulates ±2% drift
# Compares 4× vs 16× decision

▸ EXPECTED OUTPUT

At +2% clock drift:
  4× sample: fails bit 6
  16× sample: correct
     through bit 9
  (7-slot margin still)

At ±2% tolerance:
  16× PASS (by design)

🤖 Check the Machine

Ask AI: “Design a UART RX module with 16× oversampling. Describe the counter hierarchy, start-bit validation logic, and sample-point calculation.”

TASK

AI describes 16× oversampling RX.

BEFORE

Predict: 2 counters (oversample + bit), 2-FF sync, start-bit revalidation at slot 8.

AFTER

Strong AI mentions glitch rejection via mid-start resample. Weak AI skips this.

TAKEAWAY

The mid-bit resample is what distinguishes robust RX designs from fragile ones.

Key Takeaways

① RX is harder than TX — you don't control the timing.

② 16× oversampling is the universal UART RX trick.

③ Sample each bit at its midpoint (slot 8 of 16).

④ 2-FF sync + edge detect + mid-bit revalidation = robust design.

Oversampling makes asynchronous communication reliable. It's the trick that makes UART work.

🔗 Transfer

RX Implementation

Video 2 of 4 · ~12 minutes

▸ WHY THIS MATTERS NEXT

You have the theory. Video 2 is the build: FSM states, oversample counter, sample-and-shift logic. Full working Verilog. End of Video 2: your Go Board echoes characters you type in your terminal. Full-duplex. Your RTL has become a conversation partner.