Video 3 of 4 · ~12 minutes
Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Every embedded engineer you'll ever work with has written a UART TX. It's a rite of passage, the “hello world” of serial communication. On your first FPGA job, you will be asked to either (a) write one, or (b) debug one someone else wrote. Textbook implementations run on billions of chips worldwide. An elegant UART TX = competent RTL engineer; a messy UART TX = red flag.
Today we write the whole thing. ~60 lines total. You'll compile it, simulate it, synthesize it, watch the waveform. Video 4 connects it to your PC. Your capstone protocol layer uses this exact pattern. If you internalize the idioms now, they become automatic for every subsequent protocol block.
“I'll code everything in one go: FSM + datapath + baud gen + valid/busy + FIFO + parity. Then debug the whole mess together.”
Iterative integration: (1) FSM alone with fake outputs, (2) add baud counter, (3) add shift register, (4) add valid/busy handshake, (5) add stop-bit check. Test each addition before continuing. When a bug appears, you know which component is wrong — because the previous ones already passed.
module uart_tx #(
parameter CLKS_PER_BIT = 217 // 25 MHz / 115200 baud
) (
input wire i_clk, i_reset,
input wire i_valid,
input wire [7:0] i_data,
output reg o_busy,
output reg o_tx
);
localparam [1:0] S_IDLE = 2'd0, S_START = 2'd1, S_DATA = 2'd2, S_STOP = 2'd3;
localparam CNT_W = $clog2(CLKS_PER_BIT);
reg [1:0] r_state;
reg [CNT_W-1:0] r_baud; // 0..CLKS_PER_BIT-1 per bit
reg [2:0] r_bit; // 0..7, which data bit
reg [7:0] r_shift; // the byte being shifted out
always @(posedge i_clk) begin
if (i_reset) begin
r_state <= S_IDLE; r_baud <= 0; r_bit <= 0;
o_tx <= 1'b1; o_busy <= 1'b0;
end else case (r_state)
S_IDLE: begin
o_tx <= 1'b1;
o_busy <= 1'b0;
if (i_valid) begin
r_shift <= i_data;
r_state <= S_START; r_baud <= 0; o_busy <= 1'b1;
end
end
S_START: begin
o_tx <= 1'b0; // start bit
if (r_baud == CLKS_PER_BIT-1) begin
r_state <= S_DATA; r_baud <= 0; r_bit <= 0;
end else r_baud <= r_baud + 1'b1;
end
S_DATA: begin
o_tx <= r_shift[0]; // LSB first
if (r_baud == CLKS_PER_BIT-1) begin
r_baud <= 0;
r_shift <= {1'b1, r_shift[7:1]}; // shift right, fill with idle
if (r_bit == 3'd7) r_state <= S_STOP;
else r_bit <= r_bit + 1'b1;
end else r_baud <= r_baud + 1'b1;
end
S_STOP: begin
o_tx <= 1'b1; // stop bit
if (r_baud == CLKS_PER_BIT-1) begin
r_state <= S_IDLE; o_busy <= 1'b0;
end else r_baud <= r_baud + 1'b1;
end
endcase
end
endmodule
$clog2, localparam, 3-block pattern, named constants.
r_shift <= {1'b1, r_shift[7:1]}; // shift right, LSB drops off
o_tx <= r_shift[0]; // output the current LSB
o_tx, the rest of the byte shifts right by 1 (putting the old bit 1 in position 0, old bit 2 in position 1, etc.), and a 1 (idle value) fills in at bit 7. After 8 ticks, the byte has been fully transmitted and the shift register is all 1's again (back to idle state).
o_tx reads before the shift (note the nonblocking assignment semantics — both happen at the next clock edge). So on entry to each S_DATA cycle, the current LSB is already on the line; the shift happens just as we exit that cycle's baud count.
Reset, then assert i_valid=1 with i_data=8'b01010011 (0x53 = 'S'). CLKS_PER_BIT = 4 (simplified for simulation). Sketch the first 50 cycles of o_tx.
Cycles 0-3: o_tx = 0 (start bit)
Cycles 4-7: o_tx = 1 (D0 = 0b01010011[0] = 1)
Cycles 8-11: o_tx = 1 (D1 = bit 1 of data = 1)
Cycles 12-15: o_tx = 0 (D2 = bit 2 = 0)
Cycles 16-19: o_tx = 0 (D3 = 0)
Cycles 20-23: o_tx = 1 (D4 = 1)
Cycles 24-27: o_tx = 0 (D5 = 0)
Cycles 28-31: o_tx = 1 (D6 = 1)
Cycles 32-35: o_tx = 0 (D7 = 0)
Cycles 36-39: o_tx = 1 (stop bit)
Cycles 40+: o_tx = 1 (idle)
Total frame: 40 cycles = 10 bit-times × 4 cycles/bit. ✓
~7 minutes — live coding
▸ COMMANDS
cd labs/week3_day11/ex3_impl/
# Start from empty uart_tx.v skeleton
# Add FSM → test → add datapath → test
make sim # self-checks byte = 'A'
make wave
make stat # ~30 cells
gtkwave tb.vcd &
▸ EXPECTED STDOUT
PASS: IDLE after reset
PASS: o_busy after valid
PASS: start bit = 0
PASS: data bits LSB first
PASS: stop bit = 1
PASS: IDLE after 10 ticks
=== 32 passed, 0 failed ===
SB_DFFE: 20
SB_LUT4: 12
▸ GTKWAVE
Signals: r_state · r_baud · r_bit · r_shift · o_tx · o_busy. Expand to see the byte ‘A' (0x41 = 01000001) being shifted out LSB-first: 1 0 0 0 0 0 1 0.
$ yosys -p "read_verilog uart_tx.v; synth_ice40 -top uart_tx; stat" -q
=== uart_tx === (CLKS_PER_BIT=217 at 115200 baud / 25 MHz clk)
Number of wires: 49
Number of cells: 32
SB_CARRY 8 ← counter carry chain
SB_DFFE 20 ← state + baud + bit + shift + outputs
SB_LUT4 12 ← FSM + comparator + mux
Ask AI: “Write a complete UART TX in Verilog, parameterized for baud rate and clock frequency, with a valid/busy handshake. Include a self-checking testbench.”
TASK
AI writes a complete UART TX + TB.
BEFORE
Predict: 50-80 lines RTL, FSM-driven, LSB-first, parameterized.
AFTER
Strong AI gets LSB-first and idle=1. Weak AI gets MSB-first or idle=0 (both wrong).
TAKEAWAY
UART is common enough in AI training that most models do well; always verify protocol details.
① Complete UART TX = ~60 lines, ~32 cells. Tiny.
② Iterative integration beats monolithic build every time.
③ LSB-first shift-out. Idle = 1. Start = 0. Stop = 1.
④ Parameterize CLKS_PER_BIT — works at any clock/baud combination.
🔗 Transfer
Video 4 of 4 · ~8 minutes
▸ WHY THIS MATTERS NEXT
Simulation is great, but chips that only talk to simulators don't do real work. Video 4 hooks your UART TX to a USB-serial adapter and your laptop. By the end of the video, your Go Board is transmitting “HELLO” that shows up in your terminal. First time your Verilog talks to the outside world.