Day 9 · Memory Architecture

RAM in Verilog

Video 2 of 4 · ~10 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

Every CPU has an L1 cache (SRAM). Every network switch has packet buffer memory. Every GPU has a texture cache and a command FIFO. Every UART has a TX and RX FIFO. Every GoPro has a frame buffer. RAM-on-chip is the fastest memory in the system, and on-chip RAM is what block RAM models. Designs live or die by how well they manage it.

In This Course

Your Day 8 FIFO stored data internally — you'll re-write it with proper block RAM inference. Your Day 11 UART TX FIFO uses this pattern. Your Day 12 SPI buffer does too. Your capstone frame buffer (if you do video) is a block RAM. This is the memory you'll actually use.

⚠️ RAM Inference Rules Are Strict

❌ Wrong Model

“Any reg [N-1:0] mem [0:M-1] becomes a block RAM. The tool figures it out.”

✓ Right Model

Block RAM inference has strict rules. Memory must have: (1) exactly one clock, (2) synchronous writes, (3) synchronous reads with a registered output, (4) no async reset on the memory array itself, (5) no weird operations (can't partial-width-write half a word, etc.). Break any rule → LUT RAM or scattered logic. Worth checking the tool output every time.

The receipt: A 1024×8 RAM that infers correctly = 2 EBRs. The same 1024×8 RAM with a combinational read = 1024 LUTs — 80% of an iCE40 HX1K, gone. Inference rules aren't suggestions.

👁️ I Do — Single-Port Synchronous RAM

module ram_1p #(
    parameter ADDR_W = 10,           // 1024 entries
    parameter DATA_W = 8
) (
    input  wire              i_clk,
    input  wire              i_we,   // write enable
    input  wire [ADDR_W-1:0] i_addr,
    input  wire [DATA_W-1:0] i_din,
    output reg  [DATA_W-1:0] o_dout
);
    reg [DATA_W-1:0] mem [0:(2**ADDR_W)-1];

    always @(posedge i_clk) begin
        if (i_we)   mem[i_addr] <= i_din;   // synchronous write
        o_dout <= mem[i_addr];               // synchronous read (after the write, i.e. "read-before-write")
    end
endmodule

My thinking: Single port — shared address bus for read and write. Writes go in when i_we=1. Read always happens and the old value shows up on o_dout one cycle later. This is the textbook block-RAM inference pattern — and it's the pattern you'll reuse for every RAM in every design.

🤝 We Do — Read-Before-Write vs. Write-First

When i_we=1 and you're reading the same address you're writing, what comes out of o_dout?

Read-Before-Write

always @(posedge clk) begin
    if (we) mem[addr] <= din;
    dout <= mem[addr];    // old value
end

Reads the old value while writing the new. This is the natural EBR behavior on iCE40. Cheapest.

Write-First (Bypass)

always @(posedge clk) begin
    if (we) begin
        mem[addr] <= din;
        dout <= din;       // new value
    end else dout <= mem[addr];
end

Reads the new value just written. Requires extra muxing in the block RAM primitive.

Rule of thumb: Use read-before-write unless you have a specific reason. It's cheaper, maps cleanly to the hardware, and doesn't create subtle bypass bugs.

🧪 You Do — Dual-Port Sketch

What does a dual-port (one read, one write, independent addresses) RAM look like? Sketch the ports and the always block.

Sketch:

module ram_dp (
    input clk,
    input we,   input [A-1:0] waddr, input [D-1:0] din,
                input [A-1:0] raddr, output reg [D-1:0] dout
);
    reg [D-1:0] mem [0:(2**A)-1];
    always @(posedge clk) begin
        if (we) mem[waddr] <= din;
        dout <= mem[raddr];     // reads a different address
    end
endmodule

Two address ports (waddr, raddr), one shared data path. Infers a block RAM with separate read/write ports — the iCE40 EBR supports this natively. Perfect for FIFOs (write at tail, read at head).

▶ LIVE DEMO

Write, Read, Confirm Block RAM Inference

~5 minutes

▸ COMMANDS

cd labs/week3_day09/ex2_ram/
cat ram_1p.v
make sim        # self-check write/read
make wave
make stat       # look for SB_RAM40_4K

▸ EXPECTED STDOUT

PASS: write 42 @ addr 0
PASS: read 42 @ addr 0 (1 cycle later)
PASS: write cycles don't affect
      other addresses
=== 64 passed, 0 failed ===

  SB_RAM40_4K: 2  ← 2 EBRs used

▸ GTKWAVE

Signals: i_addr · i_din · i_we · o_dout. Note the 1-cycle read latency — address in at cycle N, data out at cycle N+1. That delay is the price of block-RAM-grade density.

🔧 What Did the Tool Build?

$ yosys -p "read_verilog ram_1p.v; synth_ice40 -top ram_1p; stat" -q

=== ram_1p ===    # 1024 × 8 = 8 Kbit
   Number of wires:                 21
   Number of cells:                  3
     SB_DFFE                         8    ← output register (the 'o_dout' reg)
     SB_RAM40_4KNR                   2    ← 2 EBRs with "N"on-registered read
                                              configured for read-before-write

What to notice: The synthesizer chose SB_RAM40_4KNR — the non-read-registered variant — because your o_dout register already provides the output flop. If you'd registered the read inside the EBR too, you'd have an extra unnecessary flop. Tools are smart; idiomatic code lets them be smart.

Checkpoint: 1024 × 8 RAM = 2 EBRs (12.5% of iCE40 HX1K's block RAM budget). The same array as LUT-RAM would be 8000+ LUTs — exceeding the chip. Block RAM inference is the difference between “fits” and “doesn't fit.”

🤖 Check the Machine

Ask AI: “Write a dual-port 512×16 block RAM in Verilog. It should infer block RAM on iCE40 (check by running yosys synth_ice40).”

TASK

AI writes dual-port RAM with BRAM inference.

BEFORE

Predict: two address ports, single clock, sync read+write.

AFTER

Strong AI writes one always block, single clock. Weak AI may try two clocks — doesn't infer iCE40 EBR.

TAKEAWAY

Verify with make stat. SB_RAM40_4K must appear.

Key Takeaways

① Block RAM inference = one clock + sync write + sync read + registered output.

② Read has 1-cycle latency. Plan pipelines around it.

③ Read-before-write is cheap. Write-first costs extra muxing.

④ Dual-port RAMs are the FIFO primitive. iCE40 EBRs support native dual-port.

Always check make stat for SB_RAM40_4K. If it's missing, inference failed.

🔗 Transfer

iCE40 Memory Resources

Video 3 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You now know the patterns. Video 3 shows the resource budget for your actual chip: 16 Embedded Block RAMs (EBRs) at 4 Kbit each, configurable in 5 different aspect ratios. You'll learn to plan memory usage for real designs — UART FIFO, character ROM, frame buffer, sine table — and see what fits.