Day 2 · Combinational Building Blocks

Operators

Video 2 of 4 · ~14 minutes

Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF

🌍 Where This Lives

In Industry

The first performance review on any RTL codebase is an operator audit: where are the multiplies? The dividers? The wide comparators? These drive area, timing, and power. Senior designers read code and see gates.

In This Course

Your Day 3 ALU uses +, -, &, |. Day 9 memory addressing uses comparators. Day 11 UART uses reduction operators for parity. Every lab after today uses this vocabulary.

Industry alignment: “Write area-efficient RTL” appears on almost every FPGA/ASIC job posting. That literally means: know which operators are cheap and which are expensive. This video is step 1.

⚠️ Syntax Does Not Equal Cost

❌ Wrong Model

“One operator = one operation. a + b and a & b look the same, so they cost the same.”

✓ Right Model

Each operator has a hardware footprint. & on 32 bits = 32 LUTs. + on 32 bits = 32 LUTs + 32 carry cells (a ripple chain with real delay). * on 32 bits = hundreds of LUTs or a dedicated DSP block.

The receipt: An iCE40 HX1K has 1280 LUTs total. One 32-bit multiply can consume 10% of your chip. Two of them and you've spent more area on arithmetic than on your actual design.

The Operator Cost Table

Category	Operators	Hardware Cost	On iCE40 HX1K
Bitwise	`&` `\|` `^` `~`	1 LUT per bit	Cheap
Logical	`&&` `\|\|` `!`	Reduction + 1 LUT	Cheap
Reduction (unary)	`&` `\|` `^`	Tree of LUTs	Cheap (log N)
Arithmetic +/-	`+` `-`	Adder chain (carry)	Moderate (N LUTs + N SB_CARRY)
Arithmetic *	`*`	Multiplier tree	Expensive (~N² LUTs)
Relational	`==` `<` `>`	Comparator	Moderate
Shift (constant)	`<< 3`	Rewiring only	Free
Shift (variable)	`<< n`	Barrel shifter	Expensive
Conditional	`? :`	2:1 mux	Cheap (1 LUT)

Mental shortcut: Bitwise/logical/reduction/conditional = cheap. Arithmetic +/- = moderate. Multiply / variable-shift / wide comparators = expensive. Constant shifts = free.

Print this table. Put it on your wall. “Cheap” means it fits easily. “Expensive” means you should check your utilization. Multiply is particularly nasty without DSP blocks — iCE40 HX1K has no hard multipliers, so * gets built out of LUTs.

👁️ I Do — Bitwise vs Logical

wire [3:0] a = 4'b1010;
wire [3:0] b = 4'b0101;

wire [3:0] w_bitwise = a & b;    // = 4'b0000 (per-bit AND)
wire       w_logical = a && b;   // = 1'b1 (both nonzero → true)

My thinking: Single & operates on every bit independently, result same width as operands. Double && treats each operand as a boolean (any bit set = true), result always 1 bit. Mixing them up is a classic bug — especially with C/Java muscle memory.

🤝 We Do — The Conditional Mux

// 2:1 mux
assign y = sel ? a : b;

// 4:1 mux — fill in:
assign y = sel[1] ? ( sel[0] ? /* ? */ : /* ? */ )
                  : ( sel[0] ? /* ? */ : /* ? */ );

Answer: sel[1] ? (sel[0] ? d : c) : (sel[0] ? b : a). Selects: 00→a, 01→b, 10→c, 11→d. Yosys will build 3 muxes in a tree — 2 at the first level, 1 at the root.

[Leave blanks for 30 seconds] The 4:1 nested ternary is the pattern you'll use constantly. Each nested ?: is one mux. Yosys sees the tree and maps to iCE40 mux primitives.

🧪 You Do — Predict Operator Costs

For each expression on 8-bit buses, rank by iCE40 LUT cost (low/med/high):

assign x = a & b;
assign x = a + b;
assign x = a * b;
assign x = a << 3;
assign x = a << n; (where n is a 3-bit wire)
assign x = (a > 8'd100);

Ranking (cheap→expensive): (4) FREE · (1) ~8 LUTs · (6) ~4 LUTs · (2) ~8 LUTs + 8 carry · (5) ~24 LUTs barrel · (3) ~50-80 LUTs multiply.

▶ LIVE DEMO

Building a 4:1 Mux + Cost Comparison

~5 minutes

▸ COMMANDS

cd labs/week1_day02/ex2_mux_hierarchy/
make sim                    # iverilog + vvp
make wave                   # GTKWave
make stat                   # yosys synth_ice40

▸ EXPECTED STDOUT

PASS: 2:1 mux sel=0 → b
PASS: 2:1 mux sel=1 → a
PASS: 4:1 mux sel=00 → a
PASS: 4:1 mux sel=11 → d
=== 16 passed, 0 failed ===

▸ GTKWAVE — WHAT TO LOOK FOR

Signals: sel · a · b · c · d · y. Set all to hex. Watch y react instantly (delta cycle) to any change on sel or the data inputs — that's the signature of pure combinational logic.

🔧 What Did the Tool Build?

Three 8-bit modules, side by side:

Module	Body	SB_LUT4	SB_CARRY	Verdict
`bitwise_and`	`y = a & b;`	8	0	Cheap
`adder`	`y = a + b;`	8	8	Moderate
`multiplier`	`y = a * b;`	~80	~24	Expensive

Generate yourself: yosys -p "read_verilog op_compare.v; synth_ice40; stat" -q. Edit the module, rerun stat. Watch the numbers change with code.

Mental math: The iCE40 HX1K has 1280 LUTs. A single 8×8 multiply eats ~6% of the chip. A 32×32 multiply eats... not possible on an HX1K without tricks.

🤖 Check the Machine

Ask AI: “Rank these on iCE40 LUT cost: a+b, a*b, a<<3, a<<n, a==b, all 16-bit.”

TASK

Ask for LUT cost ranking on 16-bit operands.

BEFORE

Predict: constant shift free → == cheap → + mod → barrel shift exp → * most.

AFTER

AI ordering usually correct. Absolute counts often 2× off — verify with Yosys.

TAKEAWAY

AI gives good ordinal rankings. Trust the ranking. Verify absolute numbers with stat.

Rule: For area-sensitive designs, Yosys stat is ground truth. AI estimates are useful for early rough sizing.

Key Takeaways

① Bitwise (&) = per-bit. Logical (&&) = 1-bit true/false.

② ? : is the mux. Nest for wider muxes.

③ Constant shifts are free. Variable shifts are expensive.

④ Multiply costs ~N² LUTs. Always check utilization.

Every operator has a hardware price tag. Read the receipts.

🔗 Transfer

Sized Literals & Width Matching

Video 3 of 4 · ~8 minutes

▸ WHY THIS MATTERS NEXT

You just saw that 8+8 can cost 8 LUTs + 8 carry cells. But what's the bit-width of the result? 8? 9? Here's a puzzle: 4'd15 + 4'd1 gives 0, not 16 — unless you size the result correctly. Video 3 shows you why, and how to stop silent overflow bugs.