Day 2 · Combinational Building Blocks

Operators

Video 2 of 4 · ~14 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

Data TypesOperatorsSized Literals7-Seg Display

🌍 Where This Lives

In Industry

The first performance review on any RTL codebase is an operator audit: where are the multiplies? The dividers? The wide comparators? These drive area, timing, and power. Senior designers read code and see gates.

In This Course

Your Day 3 ALU uses +, -, &, |. Day 9 memory addressing uses comparators. Day 11 UART uses reduction operators for parity. Every lab after today uses this vocabulary.

Industry alignment: “Write area-efficient RTL” appears on almost every FPGA/ASIC job posting. That literally means: know which operators are cheap and which are expensive. This video is step 1.

⚠️ Syntax Does Not Equal Cost

❌ Wrong Model

“One operator = one operation. a + b and a & b look the same, so they cost the same.”

✓ Right Model

Each operator has a hardware footprint. & on 32 bits = 32 LUTs. + on 32 bits = 32 LUTs + 32 carry cells (a ripple chain with real delay). * on 32 bits = hundreds of LUTs or a dedicated DSP block.

The receipt: An iCE40 HX1K has 1280 LUTs total. One 32-bit multiply can consume 10% of your chip. Two of them and you've spent more area on arithmetic than on your actual design.

The Operator Cost Table

CategoryOperatorsHardware CostOn iCE40 HX1K
Bitwise& | ^ ~1 LUT per bitCheap
Logical&& || !Reduction + 1 LUTCheap
Reduction (unary)& | ^Tree of LUTsCheap (log N)
Arithmetic +/-+ -Adder chain (carry)Moderate (N LUTs + N SB_CARRY)
Arithmetic **Multiplier treeExpensive (~N2 LUTs)
Relational== < >ComparatorModerate
Shift (constant)<< 3Rewiring onlyFree
Shift (variable)<< nBarrel shifterExpensive
Conditional? :2:1 muxCheap (1 LUT)
Mental shortcut: Bitwise/logical/reduction/conditional = cheap. Arithmetic +/- = moderate. Multiply / variable-shift / wide comparators = expensive. Constant shifts = free.

👁️ I Do — Bitwise vs Logical

wire [3:0] a = 4'b1010;
wire [3:0] b = 4'b0101;

wire [3:0] w_bitwise = a & b;    // = 4'b0000 (per-bit AND)
wire       w_logical = a && b;   // = 1'b1 (both nonzero → true)
My thinking: Single & operates on every bit independently, result same width as operands. Double && treats each operand as a boolean (any bit set = true), result always 1 bit. Mixing them up is a classic bug — especially with C/Java muscle memory.

🤝 We Do — The Conditional Mux

// 2:1 mux
assign y = sel ? a : b;

// 4:1 mux — fill in:
assign y = sel[1] ? ( sel[0] ? /* ? */ : /* ? */ )
                  : ( sel[0] ? /* ? */ : /* ? */ );
Answer: sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a). Selects: 00→a, 01→b, 10→c, 11→d. Yosys will build 3 muxes in a tree — 2 at the first level, 1 at the root.

🧪 You Do — Predict Operator Costs

For each expression on 8-bit buses, rank by iCE40 LUT cost (low/med/high):

  1. assign x = a & b;
  2. assign x = a + b;
  3. assign x = a * b;
  4. assign x = a << 3;
  5. assign x = a << n; (where n is a 3-bit wire)
  6. assign x = (a > 8'd100);
Ranking (cheap→expensive): (4) FREE · (1) ~8 LUTs · (6) ~4 LUTs · (2) ~8 LUTs + 8 carry · (5) ~24 LUTs barrel · (3) ~50-80 LUTs multiply.
▶ LIVE DEMO

Building a 4:1 Mux + Cost Comparison

~5 minutes

▸ COMMANDS

cd labs/week1_day02/ex2_mux_hierarchy/
make sim                    # iverilog + vvp
make wave                   # GTKWave
make stat                   # yosys synth_ice40

▸ EXPECTED STDOUT

PASS: 2:1 mux sel=0 → b
PASS: 2:1 mux sel=1 → a
PASS: 4:1 mux sel=00 → a
PASS: 4:1 mux sel=11 → d
=== 16 passed, 0 failed ===

▸ GTKWAVE — WHAT TO LOOK FOR

Signals: sel · a · b · c · d · y. Set all to hex. Watch y react instantly (delta cycle) to any change on sel or the data inputs — that's the signature of pure combinational logic.

🔧 What Did the Tool Build?

Three 8-bit modules, side by side:

ModuleBodySB_LUT4SB_CARRYVerdict
bitwise_andy = a & b;80Cheap
addery = a + b;88Moderate
multipliery = a * b;~80~24Expensive
Generate yourself: yosys -p "read_verilog op_compare.v; synth_ice40; stat" -q. Edit the module, rerun stat. Watch the numbers change with code.
Mental math: The iCE40 HX1K has 1280 LUTs. A single 8×8 multiply eats ~6% of the chip. A 32×32 multiply eats... not possible on an HX1K without tricks.

🤖 Check the Machine

Ask AI: “Rank these on iCE40 LUT cost: a+b, a*b, a<<3, a<<n, a==b, all 16-bit.”

TASK

Ask for LUT cost ranking on 16-bit operands.

BEFORE

Predict: constant shift free → == cheap → + mod → barrel shift exp → * most.

AFTER

AI ordering usually correct. Absolute counts often 2× off — verify with Yosys.

TAKEAWAY

AI gives good ordinal rankings. Trust the ranking. Verify absolute numbers with stat.

Rule: For area-sensitive designs, Yosys stat is ground truth. AI estimates are useful for early rough sizing.

Key Takeaways

 Bitwise (&) = per-bit. Logical (&&) = 1-bit true/false.

? : is the mux. Nest for wider muxes.

 Constant shifts are free. Variable shifts are expensive.

 Multiply costs ~N² LUTs. Always check utilization.

Every operator has a hardware price tag. Read the receipts.

🔗 Transfer

Sized Literals & Width Matching

Video 3 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You just saw that 8+8 can cost 8 LUTs + 8 carry cells. But what's the bit-width of the result? 8? 9? Here's a puzzle: 4'd15 + 4'd1 gives 0, not 16 — unless you size the result correctly. Video 3 shows you why, and how to stop silent overflow bugs.