Dr. Mike Borowczak · Electrical & Computer Engineering · CECS · UCF
TimingNumericsPPAASIC Context
🌍 Where This Lives
In Industry
Every chip tapeout is a PPA negotiation. Apple's A-series wants maximum performance-per-watt. ARM Cortex-M0+ wants minimum area. NVIDIA H100 wants maximum compute in a given power envelope. Product managers set targets; designers iterate architectures; physical designers tune implementation; a spreadsheet tracks every variant against P-P-A and only one ships. Engineering careers ARE PPA optimization.
In This Course
Every design you've built can be measured on these three axes. Today's lab has you produce a PPA report for three FSM variants. Your capstone will be judged on PPA. Day 14 capstone retrospective reviews designs against their PPA targets.
⚠️ There Is No Free Lunch — PPA Is a Triangle
❌ Wrong Model
“Good engineering produces a chip that's fast, low-power, and small. Better engineers optimize all three.”
✓ Right Model
PPA is a triangle: improving one axis usually costs another. Pipelining = more performance, more area (extra flops), more power (extra flops switch). Clock gating = less power, more control logic. Lower voltage = less power, lower Fmax. Real design picks the operating point. “Optimizing all three” means picking the best tradeoff, not dominating each individually.
The receipt: Apple's M3 is 15% faster than M2 but consumes 20% more power. NVIDIA's H100 is 3× the performance of A100 but consumes 2× the power. No free lunches; measurable tradeoffs.
👁️ I Do — FPGA PPA Proxies
Axis
ASIC Metric
FPGA Proxy
Performance
Fmax (MHz), Cycles/operation
nextpnr Fmax, RTL cycles
Power
Static (leakage) + Dynamic (activity)
Cell count × Fmax × activity (rough)
Area
mm² of silicon, gates
LUTs + FFs + EBRs (from yosys stat)
My thinking: On FPGAs, exact power numbers require vendor tools (Lattice Diamond, Intel Quartus). For education, cell count × clock × activity factor is a reasonable proxy: doubling cells with everything else equal → ~2× dynamic power. Absolute accuracy isn't the point; relative comparisons across your own design variants is.
🤝 We Do — The Tradeoff in Action
Same 16-bit FIR filter, 4 variants:
Variant
LUTs
EBRs
Fmax
Cycles/sample
Throughput
Fully-serial
90
0
180 MHz
16
11 Msps
Fully-parallel
820
0
105 MHz
1
105 Msps
Pipelined parallel
860
0
185 MHz
1 (+5 lat)
185 Msps
BRAM-stored coeffs
300
1
150 MHz
4
37 Msps
Together: Four points in PPA space for the same filter. Best throughput: pipelined parallel (185 Msps). Smallest: fully-serial (90 LUTs). Best throughput-per-LUT: fully-serial (120 ksps/LUT). Best use of iCE40: the BRAM variant, because it uses otherwise-idle EBRs. “Best” depends on the requirements.
🧪 You Do — Pick the Winner
Given the four variants above, pick the right one for:
Audio processing: 48 ksps in, 100 mW power budget
Video pre-processing: 100 Msps in, area is no object
Teaching lab on Go Board: must fit alongside other logic
Latency-critical control loop: 10 ns from input to output
Answers:
Fully-serial — 48 ksps << 11 Msps; minimizes area and power
Power proxy: cell count × Fmax × activity estimate
Variants considered: at least 2, with side-by-side PPA numbers
Recommendation: which variant you'd ship and why
Pro tip: Include a “requirements vs. measured” table at the top. If measured < required, the design ships. If measured > required, you've over-engineered (wasted area/power). If measured = required, you're a genius. Include the delta in the report.
▶ LIVE DEMO
PPA of Three FSM Variants
~6 minutes — binary vs one-hot vs gray
▸ COMMANDS
cd labs/week3_day10/ex3_ppa/
make ppa_report # produces CSV
cat report.csv
Pareto frontier: The outer edge — variants that aren't dominated by any other. An inner point is always strictly worse than a Pareto point on some axis. Only ship from the Pareto frontier. Publish both your measurements and the frontier you chose from.
🤖 Check the Machine
Ask AI: “Compare the PPA of a 16-bit parallel multiplier versus a 16-bit sequential shift-and-add multiplier for iCE40 HX1K. Give me a tradeoff table.”
Your PPA work so far targets FPGA. But the same RTL can target real silicon through open-source ASIC flows (OpenROAD, OpenLane). Video 4 ends the PPA day with a sneak peek: what happens when your Verilog becomes a chip? You'll see how FPGA PPA translates (and doesn't) to ASIC PPA, and why both matter.