ZPU Evo 開発者ガイド
Introduction
Welcome to the ZPU Evolution Developer’s Guide. This guide is written for students, hobbyists, and engineers who want to understand, build, modify, and extend the ZPU Evolution processor. Whether you are taking a microprocessor course, learning FPGA development, or exploring soft-core CPU design, this guide will walk you through everything from fundamental concepts to advanced customisation.
The ZPU is a 32-bit stack-based microprocessor originally designed by Øyvind Harboe of Zylin AS. The ZPU Evolution (Evo) is an enhanced version created by Philip Smart that adds significant performance improvements, caching, extended instructions, and a rich System-on-Chip (SoC) framework. The entire design is open source and implemented in VHDL for synthesis on Intel/Altera FPGAs.
What You Will Learn
By working through this guide, you will:
- Understand stack-based CPU architecture and how it differs from register-based designs
- Learn how the ZPU instruction set works at the hardware level
- Be able to configure, build, and program FPGA bitstreams for multiple development boards
- Know how to add new hardware instructions to the CPU
- Understand the SoC architecture including memory controllers, UARTs, timers, and interrupt handling
- Be able to write, compile, and deploy C programs that run on the ZPU
- Set up automated CI/CD builds using Jenkins and Docker
Prerequisites
To get the most from this guide, you should have:
- Basic digital logic knowledge - Understanding of flip-flops, multiplexers, state machines, and bus structures
- Some VHDL experience - Ability to read VHDL entity declarations, signal assignments, and process blocks (this guide explains key patterns as they appear)
- C programming - Familiarity with C syntax, pointers, and compilation
- An FPGA development board - One of the supported boards (DE10 Nano, E115, CYC1000, QMV, or DE0 Nano)
- Intel Quartus Prime - Version 17.1.1 Standard Edition (or the Docker containerised version)
Part 1: Understanding the ZPU Architecture
Stack-Based vs Register-Based Processors
Most processors you encounter (ARM, x86, RISC-V) are register-based: they have a fixed set of named registers (R0-R15, EAX, etc.) and instructions specify which registers to operate on. For example, ADD R0, R1, R2 means “add R1 and R2, store in R0”.
The ZPU is a stack-based processor. Instead of named registers, it uses a Last-In-First-Out (LIFO) stack. Operations implicitly work on the top elements of the stack:
Stack before ADD: Stack after ADD:
┌─────┐ ┌─────┐
│ 3 │ ← TOS │ 7 │ ← TOS (3+4)
├─────┤ ├─────┤
│ 4 │ ← NOS │ ... │
├─────┤ └─────┘
│ ... │
└─────┘
TOS = Top of Stack, NOS = Next on Stack.
Why use a stack architecture?
- Minimal instruction encoding - No register fields needed.
ADDis just one byte (opcode00000101), while a register-based ADD needs source and destination register fields, making it 2-4 bytes. - Very small FPGA footprint - The decoder is trivial, requiring minimal logic elements.
- Simple compiler target - Expression evaluation maps naturally to stack operations (Reverse Polish Notation).
- Trade-off - Stack machines are typically slower per-operation than register machines because of the extra stack manipulation, but the ZPU Evo’s caching system significantly mitigates this.
The ZPU Memory Model
The ZPU has a flat, byte-addressable, 32-bit memory space. All memory and I/O devices share this space:
Address Space (24-bit example, configurable):
┌──────────────────────────────────────────────┐
│ 0x000000 - 0x01FFFF : Boot BRAM (128KB) │ ← Program + Stack
│ Reset vector at 0x0000 │
│ Emulation vectors │
│ 0x0000-0x0400 │
├──────────────────────────────────────────────┤
│ 0x020000 - 0x03FFFF : Application RAM (opt) │ ← Secondary BRAM
├──────────────────────────────────────────────┤
│ 0x040000 - 0xFEFFFF : SDRAM (optional) │ ← External memory
├──────────────────────────────────────────────┤
│ 0xFF0000 - 0xFFFFFF : Memory-Mapped I/O │ ← Peripherals
├──────────────────────────────────────────────┤
│ 0x1000000 - 0x1FFFFFF: Wishbone bus (optional) │ ← Extended region
└──────────────────────────────────────────────┘
Key points:
- The stack lives at the top of BRAM, growing downward
- The program starts at address 0x0000 (configurable)
- Addresses 0x0000-0x0400 are reserved for emulation vectors (explained later)
- I/O devices are memory-mapped at the top of the address space
- The Wishbone bus, if enabled, doubles the address space
The Five CPU Models
The ZPU Evolution SoC supports five interchangeable CPU models, selectable at build time. Only one can be active:
| Model | Logic Elements | Performance | Wishbone | Best For |
|---|---|---|---|---|
| Small | ~400 LEs | Baseline | No | Minimum footprint applications |
| Medium | ~600 LEs | ~1.5x Small | No | Better performance, still small |
| Flex | ~800 LEs | ~1.8x Small | No | Good balance of size and speed |
| Evo | ~2500 LEs | ~3x Small | Yes | Maximum performance, full features |
| Evo Minimal | ~1200 LEs | ~2x Small | Yes | Evo with reduced instruction set |
The Evo model adds:
- L1 instruction cache (register-based, configurable 8-256 entries)
- L2 instruction cache (BRAM-based, configurable 256-4096 bytes)
- Memory Transaction Processor (queued memory operations)
- Dual memory bus (system bus + optional Wishbone bus)
- Optional instruction bus (separate BRAM port for instruction fetch)
- Hardware byte/word write (avoiding read-modify-write cycles)
- Extended instruction support (multi-byte instructions)
Instruction Encoding
ZPU instructions are 8 bits wide. The encoding is remarkably simple:
Bit 7 = 1: IM instruction (7-bit immediate value in bits 6:0)
Bit 7 = 0:
Bits 6:5 = 00: Core instructions (opcode in bits 4:0)
Bits 6:5 = 01: STORESP (store to stack offset in bits 4:0)
Bits 6:5 = 10: EMULATE (emulation vector index in bits 4:0)
Bits 6:5 = 11: LOADSP (load from stack offset in bits 4:0)
Loading a 32-bit constant: Since each IM instruction can only carry 7 bits, loading a full 32-bit value requires up to 5 consecutive IM instructions:
IM 0x12 → push 0x00000012 (sign-extended)
IM 0x34 → shift TOS left 7, OR in 0x34 → 0x00000934
IM 0x56 → shift left 7, OR → 0x0004_9A56
IM 0x78 → shift left 7, OR → 0x024D_2B78
IM 0x9A → shift left 7, OR → 0x1269_5BC_1A (truncated to 32 bits)
The L1 cache in the Evo allows up to 5 IM instructions to execute in a single cycle, making constant loading very efficient.
Emulation Vectors
A key ZPU feature is instruction emulation. Instructions marked with EMULATE (opcode bits 6:5 = 10) branch to a vector in the range 0x0000-0x0400. Each vector is 32 bytes:
Vector address = instruction[4:0] × 32
Example: EMULATE 12 → branches to address 0x0180 (12 × 32)
If a hardware instruction is not implemented (disabled in configuration), it triggers the EMULATE path, where software microcode implements the operation. This allows the CPU to run the same software regardless of which instructions are in hardware - just at different speeds.
This is what makes the ZPU uniquely flexible: you can trade FPGA resources for performance by selectively enabling/disabling hardware instructions.
Part 2: Project Structure
Directory Layout
ZPU/
├── cpu/ # CPU core implementations
│ ├── zpu_core_evo.vhd # ZPU Evolution CPU (3688 lines)
│ ├── zpu_core_small.vhd # ZPU Small CPU
│ ├── zpu_core_medium.vhd # ZPU Medium CPU
│ ├── zpu_core_flex.vhd # ZPU Flex CPU
│ ├── zpu_pkg.vhd # CPU package (opcodes, config)
│ └── zpu_uart_debug.vhd # Debug serialiser
│
├── devices/ # Peripheral IP cores
│ ├── sysbus/ # System bus peripherals
│ │ ├── BRAM/ # Boot ROM and RAM templates
│ │ ├── SDRAM/ # SDRAM controller
│ │ ├── uart/ # UART controllers
│ │ ├── timer/ # Timer/counter
│ │ ├── intr/ # Interrupt controller
│ │ ├── ps2/ # PS2 keyboard/mouse
│ │ ├── spi/ # SPI interface
│ │ ├── SDMMC/ # SD card controller
│ │ └── ioctl/ # MiSTer IOCTL bus
│ └── WishBone/ # Wishbone bus peripherals
│ ├── I2C/ # I2C master controller
│ ├── SRAM/ # Wishbone SRAM
│ └── SDRAM/ # Wishbone SDRAM controller
│
├── build/ # FPGA board build files
│ ├── Makefile # Build targets for all boards/CPUs
│ ├── DE10_nano_zpu.qpf # Quartus project files
│ ├── DE10_nano_zpu.qsf # Pin assignments & file lists
│ ├── DE10_nano_zpu_Toplevel.vhd # Board wrapper
│ ├── E115_zpu.qpf/qsf # E115 board files
│ ├── CYC1000_zpu.qpf/qsf # CYC1000 board files
│ ├── QMV_zpu.qpf/qsf # QMTECH Cyclone V files
│ └── DE0_nano_zpu.qpf/qsf # DE0 Nano board files
│
├── zpu_soc.vhd # SoC top level (2346 lines)
├── zpu_soc_pkg.vhd # SoC configuration package
├── zpu_soc_pkg.tmpl.vhd # Template for build system
├── VERSION # Version number for CI/CD
├── docs/ # Documentation and images
└── README.md # Project documentation
Key Source Files Explained
zpu_pkg.vhd - The CPU-level configuration package. Contains:
- Opcode definitions (
OpCode_Add,OpCode_Load, etc.) - Address bus width configuration (
maxAddrBit) - Component declarations for the CPU cores
- Debug configuration (levels, UART baud rate)
- Instruction bus enable flags (
EVO_USE_INSN_BUS,EVO_USE_WB_BUS)
zpu_soc_pkg.vhd - The SoC-level configuration. Contains:
- CPU model selection (only one of SMALL/MEDIUM/FLEX/EVO/EVO_MINIMAL)
- Board clock frequencies
- Memory geometry (BRAM size, SDRAM parameters)
- Peripheral enable/disable flags
- Cache size parameters
- Memory map constants (start/end addresses)
zpu_soc_pkg.tmpl.vhd - A template copy of the SoC package with all CPU model flags set to 0. The build system uses sed to enable the desired CPU model, generating zpu_soc_pkg.vhd from this template.
zpu_soc.vhd - The main SoC module. This is where everything comes together:
- Instantiates the selected CPU core
- Connects BRAM, SDRAM, UART, timers, SD card, etc.
- Manages the interrupt controller
- Routes memory/IO requests to the appropriate device
- Implements the Wishbone bus bridge (if enabled)
cpu/zpu_core_evo.vhd - The Evo CPU implementation (3688 lines). Contains:
- Instruction decoder and execution state machine
- L1 cache (register file)
- L2 cache (BRAM with address mapping)
- Memory Transaction Processor (MXP)
- Stack management
- Hardware instruction implementations
Part 3: Configuring the SoC
Choosing a CPU Model
Open zpu_soc_pkg.vhd (or edit the template zpu_soc_pkg.tmpl.vhd for builds). Find the CPU selection constants:
-- Choose which CPU to instantiate. Only enable ONE at a time.
constant ZPU_SMALL : integer := 0;
constant ZPU_MEDIUM : integer := 0;
constant ZPU_FLEX : integer := 0;
constant ZPU_EVO : integer := 1; -- ← Currently active
constant ZPU_EVO_MINIMAL : integer := 0;
Set exactly one to 1 and all others to 0. The build system handles this automatically when using the Makefile.
Configuring Memory
Boot BRAM size:
constant SOC_MAX_ADDR_BRAM_BIT : integer := 17;
-- 17 bits = 2^17 = 128KB of BRAM
-- 16 bits = 64KB, 15 bits = 32KB
Increasing BRAM size gives more space for firmware and stack but consumes more of the FPGA’s block RAM. Smaller FPGAs (CYC1000) may need 15 or 16 bits.
SDRAM (optional):
constant SOC_IMPL_SDRAM : boolean := false; -- Enable SDRAM
constant SOC_SDRAM_ROWS : integer := 4096;
constant SOC_SDRAM_COLUMNS : integer := 256;
constant SOC_SDRAM_BANKS : integer := 4;
constant SOC_SDRAM_DATAWIDTH : integer := 16; -- 16-bit data bus
Set SOC_IMPL_SDRAM to true to enable the system bus SDRAM controller. Adjust the geometry parameters to match your SDRAM chip’s datasheet.
Enabling Peripherals
Each peripheral is individually selectable:
constant SOC_IMPL_TIMER1 : boolean := true; -- Timer block
constant SOC_IMPL_PS2 : boolean := false; -- PS2 keyboard
constant SOC_IMPL_SPI : boolean := false; -- SPI interface
constant SOC_IMPL_SD : boolean := true; -- SD card
constant SOC_IMPL_INTRCTL : boolean := true; -- Interrupt controller
constant SOC_IMPL_SOCCFG : boolean := true; -- SoC config registers
constant SOC_IMPL_WB_I2C : boolean := false; -- I2C (Wishbone)
Disabling unused peripherals saves FPGA resources. For minimal designs, you only need the BRAM, one UART, and possibly the timer.
Tuning the Evo Cache
The Evo CPU’s performance depends heavily on cache configuration:
-- L1 Cache: register-based, fast but uses fabric
constant MAX_EVO_L1CACHE_BITS : integer := 5;
-- 5 bits = 32 instruction cache entries (uses 32 registers)
-- Increase for better IM optimisation, decrease to save fabric
-- L2 Cache: BRAM-based, larger but uses block RAM
constant MAX_EVO_L2CACHE_BITS : integer := 12;
-- 12 bits = 4096 byte L2 cache
-- Increase for fewer SDRAM stalls, decrease to save BRAM
-- Memory Transaction queue depth
constant MAX_EVO_MXCACHE_BITS : integer := 3;
-- 3 bits = 8 pending transactions
Guideline for students: Start with the defaults. If your design doesn’t fit in the FPGA, reduce L2 cache first (it uses BRAM). If you need more performance, try increasing L1 cache (costs fabric logic elements).
Part 4: Building FPGA Bitstreams
Using the Makefile
The simplest way to build is using the Makefile in the build/ directory:
cd ZPU/build
# Build a specific board/CPU combination
make DE10_nano_EVO # DE10-Nano with Evo CPU
make E115_SMALL # E115 with Small CPU
make CYC1000_MEDIUM # CYC1000 with Medium CPU
# Build all variants for a board
make DE10_nano_SMALL DE10_nano_MEDIUM DE10_nano_FLEX DE10_nano_EVO DE10_nano_EVO_MINIMAL
# Build all variants for all boards
make all
What happens during a build:
- The Makefile copies
zpu_soc_pkg.tmpl.vhdand usessedto set the chosen CPU model to 1 - The generated
zpu_soc_pkg.vhdreplaces the existing one - Quartus runs synthesis (Analysis & Synthesis) - converts VHDL to a netlist
- Quartus runs the Fitter (Place & Route) - maps to actual FPGA resources
- Quartus runs the Assembler - generates the
.sofprogramming file quartus_cpfconverts the.softo.rbf(Raw Binary Format)
Using Quartus Directly
If you prefer the Quartus GUI:
- Open the
.qpfproject file for your board (e.g.,DE10_nano_zpu.qpf) - Edit
zpu_soc_pkg.vhdto select your CPU model - Click Processing → Start Compilation (or press Ctrl+L)
- After compilation, find the
.sofin the build directory - Program the FPGA: Tools → Programmer, select your
.soffile, click Start
Using Docker (Headless Builds)
For automated or server-based builds without a Quartus installation:
# Build using the Docker container
docker run --rm \
--mac-address "02:50:dd:72:03:01" \
-e "LM_LICENSE_FILE=/srv2/license2.dat" \
-v "/run/udev:/run/udev:ro" \
-v "/sys:/sys:ro" \
-v "/path/to/ZPU:/workspace" \
-w "/workspace/build" \
quartus-ii-17.1.1 \
/opt/altera/quartus/bin/quartus_sh --flow compile DE10_nano_zpu
The CI/CD pipeline automates this for all board/CPU combinations.
Programming the FPGA
Using Quartus Programmer (GUI):
- Connect your FPGA board via USB-Blaster
- Open Quartus Programmer
- Click Auto Detect to find the FPGA
- Select your
.soffile - Click Start to program
Using command-line:
quartus_pgm -m jtag -o "p;DE10_nano_EVO.sof"
Note: .sof programming is volatile - the FPGA loses its configuration on power-off. For permanent programming, convert to .pof or use the board’s flash memory.
Part 5: The Instruction Set In Depth
Core Instructions
These are always available in hardware:
| Instruction | Opcode | Stack Effect | Description |
|---|---|---|---|
| NOP | 0x0B | — | No operation |
| ADD | 0x05 | a b → (a+b) | Add top two stack values |
| AND | 0x06 | a b → (a&b) | Bitwise AND |
| OR | 0x07 | a b → (a|b) | Bitwise OR |
| NOT | 0x09 | a → (~a) | Bitwise NOT |
| FLIP | 0x0A | a → flip(a) | Reverse bit order |
| LOAD | 0x08 | addr → value | Read 32-bit from memory |
| STORE | 0x0C | value addr → | Write 32-bit to memory |
| PUSHSP | 0x02 | — → SP | Push stack pointer |
| POPSP | 0x0D | addr → | Set stack pointer |
| POPPC | 0x04 | addr → | Pop address, jump to it (return) |
| IM | 0x80+ | — → imm | Push immediate (7 bits) |
| LOADSP | 0x60+ | — → mem[SP+n] | Load from stack offset |
| STORESP | 0x40+ | value → | Store to stack offset |
| ADDSP | 0x10+ | a → a+mem[SP+n] | Add stack offset value |
Emulatable Instructions
These can be in hardware (fast) or emulated (slow). In the Evo, all are typically enabled in hardware:
| Instruction | Opcode | Stack Effect | Description |
|---|---|---|---|
| CALL | 0x2D | addr → PC+1 | Call subroutine |
| CALLPCREL | 0x3F | offset → PC+1 | PC-relative call |
| SUB | 0x31 | a b → (b-a) | Subtract |
| MULT | 0x29 | a b → (a×b) | Multiply |
| DIV | 0x35 | a b → (a/b) | Signed divide |
| MOD | 0x36 | a b → (a%b) | Modulo |
| NEG | 0x30 | a → (-a) | Negate |
| EQ | 0x2E | a b → (a==b) | Equality test |
| NEQ | 0x2F | a b → (a!=b) | Inequality test |
| LOADB | 0x33 | addr → byte | Load byte |
| STOREB | 0x34 | val addr → | Store byte |
| LOADH | 0x22 | addr → half | Load 16-bit |
| STOREH | 0x23 | val addr → | Store 16-bit |
| ASHIFTLEFT | 0x2B | val shift → result | Arithmetic shift left |
| ASHIFTRIGHT | 0x2C | val shift → result | Arithmetic shift right |
| LSHIFTRIGHT | 0x2A | val shift → result | Logical shift right |
| XOR | 0x32 | a b → (a^b) | Exclusive OR |
Extended Instructions (Evo Only)
The Evo supports multi-byte extended instructions using the EXTEND prefix (opcode 0x0F):
Format: EXTEND, <instruction byte>, [parameter bytes]
Instruction byte: [opcode(7:2)][paramSize(1:0)]
paramSize: 00=none, 01=8-bit, 10=16-bit, 11=32-bit
Current extended instructions:
- ESR (Extended Status Register) - Read background transfer status
- LDIR (Load Increment Repeat) - Block memory copy with optional background execution
How an Instruction Executes (Evo CPU)
Understanding the execution flow helps when adding instructions. Here is how a typical instruction flows through the Evo:
- Fetch: The L1 cache provides the next instruction byte. If the L1 cache misses, it is filled from L2 cache or main memory.
- Decode: The instruction decoder examines the opcode and determines the operation.
- Execute: For simple operations (ADD, AND, etc.), the result is computed combinationally in one cycle. For memory operations, a request is submitted to the Memory Transaction Processor (MXP).
- Writeback: The result is pushed onto the stack (or the stack pointer is adjusted).
For the Evo, many operations complete in a single cycle when data is in the L1/L2 cache.
Part 6: Adding New Hardware Instructions
This is one of the most exciting parts of working with the ZPU - you can add your own custom instructions to the CPU hardware.
Step 1: Choose an Opcode
Look at the emulation vector space (opcodes with bits 6:5 = 10, i.e., 0x20-0x3F). Some are already used, but there are free slots. Alternatively, use the EXTEND mechanism for new instructions.
For a simple example, let us add a SWAP instruction that swaps TOS and NOS:
SWAP: opcode 0x20 (EMULATE 0, currently unused in many builds)
Stack: a b → b a
Step 2: Add the Opcode Constant
In cpu/zpu_pkg.vhd, add your opcode:
constant OpCode_Swap : std_logic_vector(5 downto 0) :=
std_logic_vector(to_unsigned(32, 6)); -- 0x20
Step 3: Add a Configuration Toggle
In zpu_soc_pkg.vhd, add an enable/disable constant:
constant IMPL_EVO_SWAP : boolean := true;
And pass it through the generic map in zpu_soc.vhd:
IMPL_SWAP => IMPL_EVO_SWAP,
Step 4: Add the Generic to the CPU Entity
In cpu/zpu_core_evo.vhd, add the generic parameter:
generic (
...
IMPL_SWAP : boolean := false;
...
);
Step 5: Implement the Instruction
In the instruction decoder section of zpu_core_evo.vhd, find where EMULATE instructions are handled and add:
-- SWAP instruction: swap TOS and NOS
if IMPL_SWAP = true and insnExec(5 downto 0) = OpCode_Swap then
-- Read NOS from stack
stackA <= stackB; -- TOS gets old NOS
stackB <= stackA; -- NOS gets old TOS
-- No memory access needed, no PC change
else
-- ... existing EMULATE handling (branch to vector)
The exact implementation depends on where in the state machine you are inserting and how the stack is managed. Study the existing instruction implementations (e.g., OpCode_Add, OpCode_Sub) to understand the patterns.
Step 6: Test Your Instruction
Write a test program in C using inline assembly:
#include <stdio.h>
static inline void zpu_swap(void) {
__asm__ volatile (".byte 0x20"); // SWAP opcode
}
int main() {
int a = 42, b = 99;
// Push a and b onto stack, then SWAP
printf("Before: a=%d, b=%d\n", a, b);
// ... test the instruction
return 0;
}
Using the Extended Instruction Mechanism
For more complex instructions that need parameters, use the EXTEND prefix:
EXTEND (0x0F), InstructionByte, [ParamBytes]
InstructionByte = [opcode(7:2)][paramSize(1:0)]
The L1 cache pre-fetches the parameter bytes so they are available in the same cycle as the instruction decode. This allows multi-byte instructions to execute efficiently.
Part 7: The Software Ecosystem — IOCP, zOS and Applications
Once you have built an FPGA bitstream, your ZPU needs software to run. The ZPU Evolution project provides a complete software ecosystem arranged in layers: a bootloader (IOCP) that initialises the hardware and loads an operating system (zOS), which in turn provides a shell and command-line environment from which you can run applications — much like CP/M or early DOS. Understanding these layers is essential — even if you plan to write your own software from scratch, you will need at minimum IOCP to bootstrap the CPU, and zOS to provide the runtime environment.
Overview of the Software Layers
The standard software stack on a ZPU Evolution system is:
┌───────────────────────────────────────────────────┐
│ Applications │
│ (ed, kilo, tbasic, mbasic, benchmarks, │
│ your own programs — loaded from SD card) │
├───────────────────────────────────────────────────┤
│ zOS — Operating System │
│ (Shell, file system, 80+ commands, app loading, │
│ memory management, interrupt handling) │
├───────────────────────────────────────────────────┤
│ IOCP — Bootloader │
│ (Hardware init, SD card boot, serial upload, │
│ memory monitor — embedded in BRAM) │
├───────────────────────────────────────────────────┤
│ ZPU Hardware (FPGA) │
│ (CPU, BRAM, SDRAM, UART, SD, Timer, Interrupts) │
└───────────────────────────────────────────────────┘
The boot sequence is: FPGA powers on → IOCP runs from BRAM → IOCP loads zOS from SD card → zOS presents a shell → user runs applications from the command line.
You can also embed zOS directly into BRAM (standalone mode), eliminating the need for IOCP at the cost of a longer FPGA recompile each time you update the OS.
Note on ZPUTA: During the development of ZPU Evolution, a test application called ZPUTA (ZPU Test Application) was created first to aid in hardware testing and validation. ZPUTA was then used as the template to write zOS. ZPUTA is largely redundant now that zOS exists — zOS is the software you should build and use. ZPUTA remains in the repository for historical reference and for anyone needing the original hardware test harness, but it is not covered further in this guide. All commands and features available in ZPUTA are also available in zOS.
IOCP — The Bootloader
IOCP (I/O Control Program) is the first code that executes when the ZPU powers on. It is embedded directly in the FPGA’s Block RAM during synthesis, so it is always available — no SD card or external storage is needed for it to run.
What IOCP does:
- Initialises the hardware — configures UARTs at 115200 baud, enables RX/TX FIFOs, sets up the interrupt controller and timer
- Attempts to mount the SD card — using the Petit FatFS library (a minimal read-only FAT implementation)
- Waits for user input — if you press a key within ~5 seconds, IOCP enters its interactive command monitor
- Auto-boots — if no key is pressed, loads
BOOT.ROM(orBOOTTINY.ROMfor tiny IOCP) from the SD card root directory and jumps to it
IOCP functionality levels:
IOCP is compiled with a FUNCTIONALITY parameter that controls its size:
| Level | Name | Size | Features |
|---|---|---|---|
| 0 | Full | ~40 KB | All commands, binary upload, config info, interrupt timer |
| 1 | Medium | ~20 KB | Command processor, timer, auto-boot, SD directory |
| 2 | Minimum | ~10 KB | Version display, interrupt handler, auto-boot |
| 3 | Tiny | ~3–5 KB | Bootstrap only, no interactive UI |
For students, Level 0 (Full) is the best starting point as it gives you interactive memory inspection and serial upload capabilities. For production designs where BRAM is scarce, Level 3 (Tiny) leaves maximum space for the application.
IOCP interactive commands (Full mode):
| Command | Description |
|---|---|
0 |
Execute application in Boot BRAM |
1 |
Execute application in RAM |
2 |
Upload application to BRAM via serial (with CRC-32 validation) |
3 |
Upload application to RAM via serial |
4 |
Dump BRAM memory (hex + ASCII) |
5 |
Dump Stack memory |
6 |
Dump RAM memory |
d |
List SD card directory |
C |
Clear BRAM application area |
c |
Clear RAM |
R |
Reset system |
i |
Show SoC configuration and version info |
h |
Help |
Serial upload protocol:
IOCP can receive new firmware over the UART without re-synthesising the FPGA. This is invaluable during development:
- Send the magic sequence:
I,O,C,P - Send image size (4 bytes, little-endian)
- Send CRC-32 of the image (4 bytes, inverted)
- Send the image data in 4-byte words
- IOCP validates the CRC and reports success or failure
IOCP memory layout:
0x00000 ┌──────────────────┐
│ IOCP Boot Code │ Fixed vectors, startup (0x400 bytes)
0x00400 ├──────────────────┤
│ IOCP Main Code │ Command processor, SD card, UART drivers
│ & Read-Only Data │
0x01000 ├──────────────────┤ ← IOCP_APPADDR (applications load here)
│ Application │
│ (zOS) │
│ │
├──────────────────┤ ← Top of BRAM minus stack
│ Stack │ (grows downward, 512–2048 bytes)
0x07FFF └──────────────────┘ (example for 32 KB BRAM)
IOCP remains memory-resident after loading an application, occupying the first 0x1000 bytes. The loaded application starts at IOCP_APPADDR (0x01000) and has the rest of BRAM available.
zOS — The Operating System
zOS (ZPU Operating System) is the standard operating system for the ZPU Evolution. It provides a command-line shell, full file system support, memory management, and the ability to load and run applications from SD card — analogous to how CP/M or early DOS provided a command-line environment for running programs. When you build a ZPU system, zOS is the software you should use.
zOS shell features:
- 80+ commands (file system, memory, hardware, execution)
- Readline-style line editing with history (saved to SD card)
AUTOEXEC.BATsupport — place a file namedAUTOEXEC.BATin the SD card root and zOS will execute its commands on boot, just like MS-DOS- Help system —
helpshows all commands,help <group>shows a category,help <cmd>shows detailed usage
zOS command categories:
| Category | Example Commands | Description |
|---|---|---|
| File System | fdir, fcat, fcp, fdel, fload, fexec, fmkdir, fcd |
Full file and directory operations (30+ commands) |
| Disk I/O | dinit, dstat, ddump, dioctl |
Low-level SD card access |
| Disk Buffer | bdump, bedit, bread, bwrite, bfill |
512-byte sector buffer manipulation |
| Memory | mdump, mcopy, mdiff, mtest, mperf, msrch |
Inspect, test, and benchmark memory |
| Memory Edit | meb, meh, mew |
Edit memory as bytes, halfwords, or words |
| Hardware | hr, ht, hie, hid |
Register display, timer test, interrupt control |
| Benchmarks | dhry, coremark |
CPU performance measurement (Dhrystone v2.1, CoreMark v1.0) |
| Execution | call <addr>, jmp <addr> |
Execute code at arbitrary addresses |
| Applications | ed, kilo, tbasic, mbasic |
Editors and BASIC interpreters |
| System | restart, reset, help, info, time |
System management |
Readline key bindings:
| Key | Action |
|---|---|
CTRL-A |
Move to start of line |
CTRL-E |
Move to end of line |
CTRL-K |
Clear line |
CTRL-P / Arrow Up |
Recall previous command |
CTRL-N / Arrow Down |
Recall next command |
CTRL-C |
Abort current line |
!<number> |
Re-execute a historised command |
hist |
List command history |
How zOS loads and runs applications:
When you type a command that is not built-in, zOS searches for a matching binary on the SD card:
- Searches the
/bin/directory for<command>.ZPU(the.ZPUextension matches the CPU architecture) - Loads the binary into memory at the configured application load address (default
0x0C000or0x100000) - Calls the application’s entry point, passing command-line arguments and system structures
- The application executes, using zOS API functions for I/O and file access
- When the application returns, control passes back to the zOS shell
User types: kilo myfile.txt
│
▼
┌──────────────┐ ┌───────────────┐ ┌──────────────┐
│ Command not │────►│ Search SD for │────►│ Load kilo.ZPU│
│ built-in │ │ bin/kilo.ZPU │ │ at 0x0C000 │
└──────────────┘ └───────────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ app("myfile │
│ .txt", 0) │
│ │
│ returns → 0 │
└──────┬───────┘
│
▼
┌──────────────┐
│ Back to zOS │
│ shell prompt │
└──────────────┘
The application entry point:
Every ZPU application uses a standard entry point signature:
uint32_t app(uint32_t param1, uint32_t param2)
| Parameter | Contents |
|---|---|
param1 |
Pointer to command-line arguments as a C string (char *) |
param2 |
Reserved (typically 0) |
| Return value | 0 = success, 0xFFFFFFFF = failure, other = detailed error code |
The application also receives pointers to the zOS global structures (file handles, FatFS objects, disk buffers) and the SoC configuration structure (memory sizes, peripheral flags, clock frequencies). These allow applications to use the OS’s file system and detect hardware capabilities.
zOS API — the vector table:
zOS exposes 92 API functions through a fixed vector table. Applications call these functions instead of reimplementing I/O, which keeps application binaries small and ensures correct hardware access. Key API categories:
| Category | Functions | Purpose |
|---|---|---|
| Character I/O | putchar(), puts(), getserial() |
Serial terminal output and input |
| Formatted I/O | printf(), sprintf(), xatoi() |
Formatted printing and number parsing |
| File System | f_open(), f_read(), f_write(), f_close(), f_lseek(), etc. |
Full FatFS API (24 functions) |
| Disk I/O | disk_read(), disk_write(), disk_ioctl() |
Low-level SD card access |
| Memory | malloc(), realloc(), calloc(), free() |
Dynamic memory allocation (umm_malloc) |
| Parameters | getStrParam(), getUintParam() |
Parse command-line arguments |
| System | rtcSet(), rtcGet(), crc32_init(), crc32_addword() |
RTC and CRC utilities |
zOS memory layout:
0x00000 ┌──────────────────────┐
│ IOCP (if used) │ Optional bootloader (0x1000 bytes)
0x01000 ├──────────────────────┤ ← OS_BASEADDR (if IOCP used)
│ zOS Kernel │
│ (shell, commands, │ 35–100 KB depending on configuration
│ file system, APIs) │
├──────────────────────┤
│ OS Heap │ umm_malloc managed (default 0x8000)
0x0C000 ├──────────────────────┤ ← APP_LOAD_ADDR
│ Application Code │
│ (loaded from SD) │ Up to ~0x70000 bytes
│ │
├──────────────────────┤
│ Stack │ Grows downward (default 0x3D80)
└──────────────────────┘ Top of BRAM/SDRAM
Applications
The ZPU software ecosystem includes several ready-to-use applications that demonstrate the platform’s capabilities and provide useful tools. All applications are stored as binary files on the SD card in the /bin/ directory with a .ZPU extension.
Included applications:
| Application | Description | Size |
|---|---|---|
ed |
Basic VT100 text editor — navigate with arrow keys, CTRL-S to save, CTRL-Q to quit |
~37 KB source |
kilo |
Advanced VT100 WYSIWYG editor — syntax highlighting, search, more features than ed |
~50 KB source |
tbasic |
Tiny BASIC interpreter — write and run BASIC programs interactively | ~154 KB compiled |
mbasic |
Mini BASIC v1.0 — a second BASIC dialect with a built-in editor | ~100 KB compiled |
dhry |
Dhrystone v2.1 benchmark — measures CPU integer performance in DMIPS | Built-in or applet |
coremark |
CoreMark v1.0 benchmark — industry-standard embedded CPU benchmark | Built-in or applet |
File system commands (fdir, fcat, fcp, fdel, etc.), memory commands (mdump, mtest, mperf, etc.), and hardware commands (hr, ht) can also be compiled as external SD card applets rather than built into the OS, saving BRAM space.
Writing your own application:
Creating a new ZPU application is straightforward. Here is a complete example:
Step 1: Create the application directory and source file
zOS/apps/myapp/
├── myapp.c
└── Makefile
Step 2: Write the application code (myapp.c):
#if defined(__ZPU__)
#include <zstdio.h>
#include "zpu_soc.h"
#elif defined(__K64F__)
#include <stdio.h>
#include "k64f_soc.h"
#endif
#include "ff.h"
#include "xprintf.h"
#include "zOS_app.h"
uint32_t app(uint32_t param1, uint32_t param2)
{
char *ptr = (char *)param1; // Command-line arguments
long value;
xprintf("Hello from my ZPU application!\n");
xprintf("Arguments: %s\n", ptr);
// Parse a numeric argument if provided
if (xatoi(&ptr, &value)) {
xprintf("Numeric argument: %ld (0x%08lX)\n", value, value);
}
// Access the file system
FIL file;
FRESULT res = f_open(&file, "0:\\test.txt", FA_CREATE_ALWAYS | FA_WRITE);
if (res == FR_OK) {
f_puts("Written from my ZPU app!\n", &file);
f_close(&file);
xprintf("File written successfully.\n");
}
return 0; // Success
}
Step 3: Create the Makefile:
APP_NAME = myapp
APP_DIR = $(CURDIR)/..
BASEDIR = ../../..
APP_C_SRC =
CFLAGS =
CPPFLAGS =
LDFLAGS = -nostdlib
ifeq ($(__K64F__),1)
include $(APP_DIR)/Makefile.k64f
else
include $(APP_DIR)/Makefile.zpu
endif
Step 4: Build and deploy:
cd zOS/apps/myapp
make __ZPU__=1 # Build for ZPU
# Copy binary to SD card
cp myapp.ZPU /path/to/sdcard/bin/
Step 5: Run from the zOS shell:
* myapp 42
Hello from my ZPU application!
Arguments: 42
Numeric argument: 42 (0x0000002A)
File written successfully.
*
Application API access:
Your application does not need to implement its own UART driver, file system, or memory allocator. All of these are provided by the zOS kernel through the vector table. When your application calls xprintf(), it uses the kernel’s UART driver. When it calls f_open(), it uses the kernel’s FatFS instance and SD card driver. This is similar to how a CP/M transient program calls BDOS functions — the OS provides the services, and your application focuses on its own logic.
Building the Software
All ZPU software (IOCP, zOS, and applications) is built from the zOS repository using a unified build.sh script.
# Clone the repository
git clone https://git.eaw.app/eaw/zOS.git
cd zOS
Build commands for each component:
# Build IOCP (Tiny, for Evo CPU, 128 KB BRAM)
./build.sh -C Evo -I 3 -o 0 -M 0x1FD80 -B 0x0000
# Build zOS (standalone, for Evo CPU)
./build.sh -C Evo -O zos -o 0 -M 0x1FD80 -B 0x0000 \
-S 0x3D80 -N 0x8000 -A 0x100000 -a 0x70000
# Build zOS (loaded by IOCP, base at 0x1000)
./build.sh -C Evo -O zos -o 2 -M 0x1FD80 -B 0x1000 \
-S 0x3D80 -N 0x8000 -A 0x0C000 -a 0x70000
Build parameter reference:
| Flag | Purpose | Example Values |
|---|---|---|
-C |
Target CPU model | Small, Medium, Flex, Evo, EvoMin, K64F |
-O |
Operating system | zos (or zputa for legacy test application) |
-I |
IOCP functionality level | 0 (Full), 1 (Medium), 2 (Minimum), 3 (Tiny) |
-o |
Boot mode | 0 (standalone), 1 (app with IOCP), 2 (app with Tiny IOCP), 3 (RAM) |
-M |
Maximum BRAM size | 0x8000 (32 KB), 0x10000 (64 KB), 0x1FD80 (128 KB) |
-B |
OS base address | 0x0000 (standalone), 0x1000 (after IOCP) |
-S |
Stack size | 0x3D80 |
-N |
OS heap size | 0x8000 |
-A |
Application load address | 0x0C000, 0x100000 |
-a |
Maximum application size | 0x70000 |
-n |
Application heap size | 0x0000 (shared with OS) |
-s |
Application stack size | 0x0000 (shared with OS) |
-T |
Enable tranZPUter mode | (flag, no value) |
-d |
Debug build | (flag, no value) |
Build outputs:
The build produces several output files:
| File | Description |
|---|---|
main.bin |
Raw binary image |
main.hex |
Intel HEX format |
main.srec |
Motorola S-record format |
main.elf |
ELF with debug symbols (for debuggers) |
main.lss |
Linker map and memory layout summary |
rtl/TZSW_*.vhd |
VHDL BRAM initialisation files (for FPGA synthesis) |
build/SD/bin/*.ZPU |
Application binaries for SD card |
The VHDL files (e.g. TZSW_DualPortBootBRAM.vhd) are the key output for FPGA integration — they contain the compiled firmware as BRAM initialisation data. When you run Quartus to compile the FPGA design, these files are included in the synthesis, embedding the firmware directly into the bitstream.
How Firmware Gets Into the FPGA
The ZPU boots from Block RAM (BRAM) inside the FPGA. The firmware must be embedded into the BRAM during FPGA compilation:
- C source files are compiled with
zpu-elf-gcc→ ELF binary - The ELF binary is converted to VHDL BRAM initialisation (via the
zpugentool) or a Memory Initialisation File (.mif) - The BRAM VHDL entity includes the initialisation data as pre-loaded content
- When Quartus compiles the design, it synthesises the BRAM with the firmware pre-loaded
- On power-up, the CPU starts fetching instructions from BRAM address 0
For development, there are two ways to iterate without re-synthesising the FPGA:
- Serial upload via IOCP — use the serial upload protocol to send new firmware directly to BRAM through the UART. This takes seconds compared to minutes for a full Quartus recompile.
- SD card boot — place your compiled binary as
BOOT.ROMon the SD card. IOCP will load it on each power cycle, so you just need to update the file on the SD card.
Memory-Mapped I/O
Software accesses hardware peripherals through memory-mapped registers. The SoC configuration registers (SOCCFG) allow software to detect the hardware configuration at runtime:
// Read SoC configuration
volatile unsigned int *soccfg = (unsigned int *)0xFF0000;
unsigned int cpu_id = soccfg[0]; // CPU model and revision
unsigned int soc_config = soccfg[1]; // Enabled peripherals bitmap
The UART, timer, SD card, and other peripherals each have their own memory-mapped register blocks within the I/O region. The zpu_soc.h header file defines all register addresses and bit fields — include this header in your application code to access the hardware correctly.
Typical Development Workflow
Here is a recommended workflow for a student starting from scratch:
- Build the FPGA bitstream with IOCP embedded (Part 4)
- Program the FPGA and connect a serial terminal at 115200 baud
- Interact with IOCP — use
ito see the SoC configuration,4to dump BRAM, verify the hardware is working - Build zOS using
build.shand copyBOOT.ROMplus thebin/directory to a FAT32-formatted SD card - Insert the SD card and reset the board — IOCP loads zOS automatically
- Explore zOS — try
help,fdir,mdump 0 100,mtest,dhry - Run applications — try
ed myfile.txtortbasicto see the editors and BASIC interpreter - Write your own application — create a new directory under
zOS/apps/, write yourapp()function, build, copy to the SD card, and run it from the zOS shell - Iterate — modify your application, rebuild, copy to SD card, and run again. No FPGA recompile needed.
This workflow lets you start producing working software in minutes, without waiting for lengthy FPGA compilations except during the initial hardware setup.
Part 8: Writing Custom Software
This section covers the lower-level details of compiling and linking software for the ZPU, for those who want to go beyond using the provided build scripts or who want to understand what happens under the hood.
The ZPU GCC Toolchain
The ZPU uses a GCC cross-compiler: zpu-elf-gcc. This is a standard GCC port that produces ZPU machine code. The toolchain includes:
| Tool | Purpose |
|---|---|
zpu-elf-gcc |
C/C++ compiler |
zpu-elf-as |
Assembler |
zpu-elf-ld |
Linker |
zpu-elf-objcopy |
Binary format conversion (ELF → HEX, BIN, SREC) |
zpu-elf-objdump |
Disassembler and ELF inspection |
zpu-elf-size |
Section size reporting |
Important compiler flags for the ZPU Evo:
# Enable hardware instructions (Evo/EvoMin only)
zpu-elf-gcc -mloadsp -mstoresp -mpushspadd -mneqbranch -maddsp \
-mmult -mdiv -mmod -mneg \
-Os -o myapp.elf myapp.c
The -m flags tell the compiler to use hardware instructions rather than emulation. For Small/Medium/Flex CPUs, omit the arithmetic flags (-mmult, -mdiv, etc.) as those instructions are emulated in software.
Linker Scripts and Memory Layout
The linker script (.ld file) defines where code and data are placed in memory. A typical ZPU linker script specifies:
MEMORY {
BOOT (rx) : ORIGIN = 0x00000000, LENGTH = 0x00000400
CODE (rwx) : ORIGIN = 0x00000400, LENGTH = 0x0000FC00
}
SECTIONS {
.fixed_vectors : { *(.fixed_vectors) } > BOOT
.text : { *(.text*) } > CODE
.rodata : { *(.rodata*) } > CODE
.data : { *(.data*) } > CODE
.bss : { *(.bss*) } > CODE
}
The startup code (romcrt0.s) sets up the interrupt vector at address 0, initialises the stack pointer, clears the .bss section, and jumps to main() (for IOCP/zOS) or app() (for applications).
Inline Assembly
You can embed ZPU instructions directly in C code using GCC inline assembly. This is useful for accessing hardware instructions not generated by the compiler:
// Push a value onto the ZPU stack and execute a custom opcode
static inline void zpu_nop(void) {
__asm__ __volatile__("nop");
}
// Read the stack pointer
static inline unsigned int zpu_get_sp(void) {
unsigned int sp;
__asm__ __volatile__("pushsp\n\tload\n\t" : "=r"(sp));
return sp;
}
Part 9: Understanding the SoC Architecture
Block Diagram
┌─────────────────────────────────────────┐
│ ZPU SoC │
│ │
│ ┌──────────┐ ┌────────────────┐ │
│ │ │ │ Boot BRAM │ │
│ │ CPU │◄──►│ (Dual-Port) │ │
│ │ (Small/ │ │ Port A: Data │ │
UART TX/RX ◄──────┤ │ Medium/ │ │ Port B: Insn │ │
│ │ Flex/ │ └────────────────┘ │
SD Card SPI ◄──────┤ │ Evo/ │ │ │
│ │ EvoMin) │ ┌──────┴──────┐ │
GPIO/LEDs ◄────────┤ │ │ │ System Bus │ │
│ └────┬─────┘ └──────┬──────┘ │
│ │ │ │
│ ┌────┴─────────────────┴────────┐ │
│ │ Bus Decoder │ │
│ └─┬───┬───┬───┬───┬───┬───┬────┘ │
│ │ │ │ │ │ │ │ │
│ ┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴──┐ │
│ │U0││U1││T1││SD││PS││SP││INTR│ │
│ │ ││ ││ ││ ││2 ││I ││CTRL│ │
│ └──┘└──┘└──┘└──┘└──┘└──┘└────┘ │
│ │
│ ┌────────────────────────────────┐ │
│ │ Wishbone Bus (optional) │ │
│ └─┬────────┬────────┬───────────┘ │
│ │ │ │ │
│ ┌─┴──┐ ┌─┴──┐ ┌─┴──────┐ │
│ │I2C │ │SRAM│ │WB SDRAM│ │
│ └────┘ └────┘ └────────┘ │
└─────────────────────────────────────────┘
The Interrupt Controller
The interrupt controller supports up to 16 prioritised interrupt sources (SOC_INTR_MAX). Each source can be individually enabled/disabled via the interrupt enable register:
// Enable timer interrupt (source 0)
volatile unsigned int *intr = (unsigned int *)INTR_BASE;
intr[INTR_ENABLE] = (1 << 0); // Enable source 0
// In interrupt handler:
unsigned int status = intr[INTR_STATUS]; // Read pending interrupts
intr[INTR_STATUS] = status; // Clear handled interrupts
Adding a New Peripheral
To add a new peripheral to the SoC:
- Write the VHDL module with a memory-mapped register interface
- Add it to
zpu_soc.vhdwithin the appropriate bus (system or Wishbone) - Add address decoding for your peripheral’s register space
- Add a configuration flag in
zpu_soc_pkg.vhd(SOC_IMPL_MYDEVICE) - Add the VHDL file to all relevant
.qsfproject files - Write a C driver with the register addresses and access functions
Part 10: Board-Specific Setup
Supported Development Boards
| Board | FPGA | Logic Elements | BRAM | Clock | Price Range |
|---|---|---|---|---|---|
| DE10 Nano | Cyclone V 5CSEBA6U23I7 | 110K ALMs | 5.5Mbit | 50MHz (PLL to 100MHz) | ~$130 |
| E115 | Cyclone IV E EP4CE115F23I7 | 114K LEs | 3.9Mbit | 50MHz (PLL to 75MHz) | ~$50 |
| CYC1000 | Cyclone 10 LP 10CL025YU256C8G | 25K LEs | 594Kbit | 12MHz (PLL to 100MHz) | ~$30 |
| QMV | Cyclone V 5CEFA2F23C8 | 25K ALMs | 1.8Mbit | 50MHz (PLL to 75MHz) | ~$50 |
| DE0 Nano | Cyclone V 5CSEMA4U23C6 | 40K ALMs | 2.5Mbit | 50MHz (PLL to 100MHz) | ~$80 |
Setting Up a New Board
To port the ZPU SoC to a new FPGA board:
- Create pin assignments - Map FPGA pins to board peripherals (UART, LEDs, SD card, SDRAM, etc.) in a new
.qsffile - Create a top-level wrapper - A
<board>_Toplevel.vhdthat instantiates the PLL (clock generation) and connects board I/O to thezpu_socentity - Add clock frequency - Define
SYSCLK_<BOARD>_FREQinzpu_soc_pkg.vhd - Create a Quartus project - New
.qpffile referencing all source files - Add Makefile targets - Add board-specific build targets
Example top-level structure:
entity MyBoard_zpu_Toplevel is
port (
CLOCK_50 : in std_logic; -- 50MHz input clock
UART_TX : out std_logic; -- UART transmit
UART_RX : in std_logic; -- UART receive
LED : out std_logic_vector(3 downto 0);
SD_CS : out std_logic; -- SD card chip select
SD_CLK : out std_logic; -- SD card clock
SD_MOSI : out std_logic; -- SD card data out
SD_MISO : in std_logic -- SD card data in
);
end entity;
architecture rtl of MyBoard_zpu_Toplevel is
signal sysclk : std_logic; -- Generated system clock
begin
-- PLL: 50MHz → 100MHz
PLL0 : entity work.pll port map (
inclk0 => CLOCK_50, c0 => sysclk
);
-- Instantiate the ZPU SoC
SOC0 : entity work.zpu_soc port map (
sysclk => sysclk,
-- Connect UART, SD, LEDs, etc.
);
end architecture;
Part 11: CI/CD and Automated Builds
Pipeline Overview
The ZPU Evolution uses Jenkins for continuous integration. Every push to a monitored branch triggers an automated build that:
- Checks out the latest source code
- Builds zOS firmware (if the zOS builder Docker image is available)
- Compiles all 25 FPGA variants (5 boards x 5 CPU models)
- Packages release tarballs
- Creates a tagged release on Gitea with all artifacts
Docker Build Environment
Builds run inside Docker containers for reproducibility:
quartus-ii-17.1.1- Contains Intel Quartus Prime 17.1.1 Standard Edition for FPGA compilationzos-builder- Contains the ZPU GCC toolchain for firmware compilation
The Quartus container requires specific volume mounts for license verification:
docker run --rm \
--mac-address "02:50:dd:72:03:01" \ # License MAC
-e "LM_LICENSE_FILE=/srv2/license2.dat" \ # License file
-v "/run/udev:/run/udev:ro" \ # FlexLM device scan
-v "/sys:/sys:ro" \ # System info
-v "$PWD:/workspace" \ # Project files
-w "/workspace/build" \
quartus-ii-17.1.1 \
/opt/altera/quartus/bin/quartus_sh --flow compile DE10_nano_zpu
Release Artifacts
Successful builds produce:
.soffiles - JTAG programming files (volatile, for development).rbffiles - Raw binary files (for configuration devices, compressed)- Tarballs - Per-board and complete packages uploaded to Gitea releases
Setting Up Your Own CI/CD
To replicate this pipeline:
- Install Jenkins with the Generic Webhook Trigger plugin
- Build or obtain the Quartus Docker image
- Deploy the pipeline script from
/var/jenkins_home/pipeline-scripts/zpu-build.groovy - Configure a Gitea webhook pointing to Jenkins
- Ensure the FlexLM license is accessible to the Docker containers
Part 12: Debugging and Troubleshooting
Hardware Debug Serialiser
The Evo CPU includes a built-in debug serialiser that outputs CPU state over UART1. Enable it in zpu_pkg.vhd:
constant DEBUG_CPU : boolean := true;
constant DEBUG_LEVEL : integer := 2; -- 0=basic, 5=everything
constant DEBUG_TX_BAUD_RATE : integer := 115200;
Connect a terminal to UART1 TX to see:
- Current PC, stack pointer, TOS, NOS
- Executing instruction and signals
- L1 and L2 cache contents (at higher debug levels)
- Breakpoint events
Using SignalTap
For deeper FPGA-level debugging, use Intel’s SignalTap logic analyser:
- In Quartus, go to Tools → SignalTap Logic Analyzer
- Add signals of interest (e.g., CPU state machine, bus transactions)
- Set trigger conditions
- Compile and program the FPGA
- Capture and analyse waveforms
Common Issues
“Segment Violation” in Docker builds:
- Ensure
/run/udevand/sysare mounted read-only in the Docker container - The FlexLM license manager needs these to enumerate network devices
Compilation succeeds but CPU does not boot:
- Check that the BRAM initialisation file matches the CPU model
- Verify reset address (
SOC_RESET_ADDR_CPU) points to valid code - Ensure the PLL is generating the correct clock frequency
“Entity not found” errors:
- Check that all required VHDL files are listed in the
.qsfproject file - The
zOS_DualPort3264BootBRAM.vhdis needed for EVO/EVO_MINIMAL models
Timing failures:
- Reduce clock frequency or add pipeline registers
- Check the TimeQuest timing report for the critical path
- Consider reducing cache sizes to simplify routing
Part 13: Further Reading and Resources
ZPU Resources
| Resource | URL |
|---|---|
| ZPU Evolution Repository | git.eaw.app/eaw/ZPU |
| Original Zylin ZPU | github.com/zylin/zpu |
| ZPU GCC Toolchain | github.com/zylin/zpugcc |
| ZPU Flex | github.com/robinsonb5/ZPUFlex |
| ZPU Wikipedia | en.wikipedia.org/wiki/ZPU |
| zOS Repository | git.eaw.app/eaw/zOS |
Learning Resources
| Topic | Recommended |
|---|---|
| VHDL for beginners | “Free Range VHDL” (free e-book) |
| FPGA development | Intel FPGA University Program materials |
| Stack machine theory | “Stack Computers: the new wave” by Philip Koopman |
| Quartus Prime | Intel Quartus Prime Handbook |
| Wishbone bus | Wishbone B4 specification (opencores.org) |
Exercises for Students
- Basic: Build the ZPU Small for your board. Connect a terminal at 115200 baud and interact with IOCP.
- Intermediate: Modify
zpu_soc_pkg.vhdto change the BRAM size. Observe the effect on available stack space. - Intermediate: Enable the PS2 controller and connect a keyboard. Write a C program that echoes key presses.
- Advanced: Add a hardware SWAP instruction (as described in Part 6). Verify it works with inline assembly.
- Advanced: Port the design to a new FPGA board not currently supported.
- Expert: Implement a new extended instruction using the EXTEND mechanism. For example, a hardware string compare.