Video 1 of 4 · ~10 minutes
Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
Every CPU has a boot ROM (containing the first instructions executed after reset — Apple's M-series chips have a multi-megabyte iBoot ROM). Every video device has a character ROM. Every DSP has coefficient tables, sine tables, windowing functions in ROM. The GPU in your phone has texture ROMs. “Pre-compute and look up” is faster and lower-power than compute-at-runtime, so ROMs are everywhere silicon meets math.
Today's ROM patterns appear in Day 9.4 (LED pattern sequencer), Day 10.2 (lookup-based multipliers), Day 11.3 (UART character ROM for HELLO demo), and every capstone design. ROM isn't a nice-to-have — it's the first memory you'll reach for.
“To use block RAM, I need a special SB_RAM40_4K primitive. I should instantiate it directly like any other module.”
You write a standard Verilog memory pattern (reg [7:0] mem [0:255]; ... data <= mem[addr];) and the synthesizer infers the resource. Small ROMs (16 entries) become LUTs. Medium ROMs (256 entries) become distributed LUT-RAM or block RAM. Large ROMs become block RAM. Idiomatic code → tool-chosen target.
SB_RAM40_4K for iCE40, RAMB36E2 for Xilinx) work but lock you to a chip. Idiomatic patterns are portable across vendors and get inferred correctly everywhere.
case-Based ROMmodule rom_case (
input wire [2:0] i_addr,
output reg [7:0] o_data
);
always @(*) begin
case (i_addr)
3'd0: o_data = 8'h48; // 'H'
3'd1: o_data = 8'h45; // 'E'
3'd2: o_data = 8'h4C; // 'L'
3'd3: o_data = 8'h4C; // 'L'
3'd4: o_data = 8'h4F; // 'O'
default: o_data = 8'h00;
endcase
end
endmodule
always @(*)), no clock needed — the ROM's contents are truly fixed at synthesis time. For 5 entries this is perfectly readable. At 256 entries, it becomes a maintenance nightmare. default case covers the unused addresses and prevents latch inference.
$readmemhmodule rom_array #(
parameter ADDR_W = 8,
parameter DATA_W = 8,
parameter INIT_FILE = "rom_contents.hex"
) (
input wire i_clk,
input wire [ADDR_W-1:0] i_addr,
output reg [DATA_W-1:0] o_data
);
reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];
initial $readmemh(INIT_FILE, mem); // synth-time content load
always @(posedge i_clk) o_data <= mem[i_addr]; // synchronous read
endmodule
$readmemh you saw in Day 6 testbenches, but here it's used at synthesis, (3) synchronous read — o_data appears one cycle after i_addr. That synchronous read is the magic word: it's what triggers block RAM inference.
// Intended: a 1024-entry ROM that should infer block RAM
module rom_bad (input wire [9:0] addr, output wire [7:0] data);
reg [7:0] mem [0:1023];
initial $readmemh("rom.hex", mem);
assign data = mem[addr]; // ← combinational read
endmodule
Why will the tool not infer block RAM here?
assign, not a clocked always block). Block RAM reads require a clock edge. The synthesizer will be forced to map this 1024-entry array as LUT RAM or even scattered combinational logic — orders of magnitude more expensive. Fix: change to always @(posedge clk) data <= mem[addr]; with data as reg.
~5 minutes
▸ COMMANDS
cd labs/week3_day09/ex1_rom/
cat hello.hex # 'H','E','L','L','O'
make stat_case # tiny case-ROM
make stat_array # synchronous array ROM
make sim
gtkwave tb_rom.vcd &
▸ EXPECTED STDOUT
=== rom_case ===
SB_LUT4: 5 SB_DFF: 0
=== rom_array ===
SB_LUT4: 0 SB_DFF: 8
SB_RAM40_4K: 1 ← block RAM!
5 × 8 = 40 bits
stored in a 4Kbit EBR.
▸ KEY OBSERVATION
The 5-entry case ROM costs 5 LUTs. The 5-entry array ROM costs 1 block RAM. For this size, case is cheaper. But the same array code scales to 4096 entries with zero more LUTs — block RAM stays the same size, case explodes. The array pattern wins at scale.
$ yosys -p "read_verilog rom_array.v; synth_ice40 -top rom_array; stat" -q
=== rom_array === # ADDR_W=10, 1024 × 8-bit ROM (8 Kbit)
Number of wires: 15
Number of cells: 3
SB_DFFE 8 ← output register
SB_RAM40_4K 2 ← 2 block RAMs @ 4Kbit each
= 8 Kbit total ✓
Ask AI: “Write a synchronous-read ROM in Verilog with 256 entries of 16 bits, loaded from a hex file. Include a testbench.”
TASK
Ask AI for a parameterized BRAM-inferred ROM.
BEFORE
Predict: array, $readmemh, clocked read with <=, parameters.
AFTER
Strong AI uses synchronous read. Weak AI uses assign — won't infer BRAM.
TAKEAWAY
Verify with make stat that SB_RAM40_4K appears.
① Write idiomatic RTL; let the tool pick the resource.
② Case ROM: readable for small tables, doesn't scale.
③ Array + $readmemh: scales to any size, external content.
④ Synchronous read is the trigger for block-RAM inference.
🔗 Transfer
Video 2 of 4 · ~10 minutes
▸ WHY THIS MATTERS NEXT
ROM is read-only — great for fixed content. Real designs also need read-write memory: FIFOs for buffering, register files for CPUs, frame buffers for video. Video 2 shows the RAM pattern that infers the same block-RAM silicon, plus the read-before-write vs. write-first choice that shapes your next-cycle behavior.