ZPU Evo 開発者ガイド


Introduction

Welcome to the ZPU Evolution Developer’s Guide. This guide is written for students, hobbyists, and engineers who want to understand, build, modify, and extend the ZPU Evolution processor. Whether you are taking a microprocessor course, learning FPGA development, or exploring soft-core CPU design, this guide will walk you through everything from fundamental concepts to advanced customisation.

The ZPU is a 32-bit stack-based microprocessor originally designed by Øyvind Harboe of Zylin AS. The ZPU Evolution (Evo) is an enhanced version created by Philip Smart that adds significant performance improvements, caching, extended instructions, and a rich System-on-Chip (SoC) framework. The entire design is open source and implemented in VHDL for synthesis on Intel/Altera FPGAs.

What You Will Learn

By working through this guide, you will:

  • Understand stack-based CPU architecture and how it differs from register-based designs
  • Learn how the ZPU instruction set works at the hardware level
  • Be able to configure, build, and program FPGA bitstreams for multiple development boards
  • Know how to add new hardware instructions to the CPU
  • Understand the SoC architecture including memory controllers, UARTs, timers, and interrupt handling
  • Be able to write, compile, and deploy C programs that run on the ZPU
  • Set up automated CI/CD builds using Jenkins and Docker

Prerequisites

To get the most from this guide, you should have:

  • Basic digital logic knowledge - Understanding of flip-flops, multiplexers, state machines, and bus structures
  • Some VHDL experience - Ability to read VHDL entity declarations, signal assignments, and process blocks (this guide explains key patterns as they appear)
  • C programming - Familiarity with C syntax, pointers, and compilation
  • An FPGA development board - One of the supported boards (DE10 Nano, E115, CYC1000, QMV, or DE0 Nano)
  • Intel Quartus Prime - Version 17.1.1 Standard Edition (or the Docker containerised version)

Part 1: Understanding the ZPU Architecture

Stack-Based vs Register-Based Processors

Most processors you encounter (ARM, x86, RISC-V) are register-based: they have a fixed set of named registers (R0-R15, EAX, etc.) and instructions specify which registers to operate on. For example, ADD R0, R1, R2 means “add R1 and R2, store in R0”.

The ZPU is a stack-based processor. Instead of named registers, it uses a Last-In-First-Out (LIFO) stack. Operations implicitly work on the top elements of the stack:

Stack before ADD:   Stack after ADD:
  ┌─────┐            ┌─────┐
  │  3  │ ← TOS      │  7  │ ← TOS (3+4)
  ├─────┤            ├─────┤
  │  4  │ ← NOS      │ ... │
  ├─────┤            └─────┘
  │ ... │
  └─────┘

TOS = Top of Stack, NOS = Next on Stack.

Why use a stack architecture?

  1. Minimal instruction encoding - No register fields needed. ADD is just one byte (opcode 00000101), while a register-based ADD needs source and destination register fields, making it 2-4 bytes.
  2. Very small FPGA footprint - The decoder is trivial, requiring minimal logic elements.
  3. Simple compiler target - Expression evaluation maps naturally to stack operations (Reverse Polish Notation).
  4. Trade-off - Stack machines are typically slower per-operation than register machines because of the extra stack manipulation, but the ZPU Evo’s caching system significantly mitigates this.

The ZPU Memory Model

The ZPU has a flat, byte-addressable, 32-bit memory space. All memory and I/O devices share this space:

Address Space (24-bit example, configurable):
┌──────────────────────────────────────────────┐
│ 0x000000 - 0x01FFFF : Boot BRAM (128KB)      │  ← Program + Stack
│                        Reset vector at 0x0000 │
│                        Emulation vectors      │
│                        0x0000-0x0400          │
├──────────────────────────────────────────────┤
│ 0x020000 - 0x03FFFF : Application RAM (opt)   │  ← Secondary BRAM
├──────────────────────────────────────────────┤
│ 0x040000 - 0xFEFFFF : SDRAM (optional)        │  ← External memory
├──────────────────────────────────────────────┤
│ 0xFF0000 - 0xFFFFFF : Memory-Mapped I/O       │  ← Peripherals
├──────────────────────────────────────────────┤
│ 0x1000000 - 0x1FFFFFF: Wishbone bus (optional) │  ← Extended region
└──────────────────────────────────────────────┘

Key points:

  • The stack lives at the top of BRAM, growing downward
  • The program starts at address 0x0000 (configurable)
  • Addresses 0x0000-0x0400 are reserved for emulation vectors (explained later)
  • I/O devices are memory-mapped at the top of the address space
  • The Wishbone bus, if enabled, doubles the address space

The Five CPU Models

The ZPU Evolution SoC supports five interchangeable CPU models, selectable at build time. Only one can be active:

Model Logic Elements Performance Wishbone Best For
Small ~400 LEs Baseline No Minimum footprint applications
Medium ~600 LEs ~1.5x Small No Better performance, still small
Flex ~800 LEs ~1.8x Small No Good balance of size and speed
Evo ~2500 LEs ~3x Small Yes Maximum performance, full features
Evo Minimal ~1200 LEs ~2x Small Yes Evo with reduced instruction set

The Evo model adds:

  • L1 instruction cache (register-based, configurable 8-256 entries)
  • L2 instruction cache (BRAM-based, configurable 256-4096 bytes)
  • Memory Transaction Processor (queued memory operations)
  • Dual memory bus (system bus + optional Wishbone bus)
  • Optional instruction bus (separate BRAM port for instruction fetch)
  • Hardware byte/word write (avoiding read-modify-write cycles)
  • Extended instruction support (multi-byte instructions)

Instruction Encoding

ZPU instructions are 8 bits wide. The encoding is remarkably simple:

Bit 7 = 1: IM instruction (7-bit immediate value in bits 6:0)
Bit 7 = 0:
  Bits 6:5 = 00: Core instructions (opcode in bits 4:0)
  Bits 6:5 = 01: STORESP (store to stack offset in bits 4:0)
  Bits 6:5 = 10: EMULATE (emulation vector index in bits 4:0)
  Bits 6:5 = 11: LOADSP (load from stack offset in bits 4:0)

Loading a 32-bit constant: Since each IM instruction can only carry 7 bits, loading a full 32-bit value requires up to 5 consecutive IM instructions:

IM 0x12      → push 0x00000012 (sign-extended)
IM 0x34      → shift TOS left 7, OR in 0x34 → 0x00000934
IM 0x56      → shift left 7, OR → 0x0004_9A56
IM 0x78      → shift left 7, OR → 0x024D_2B78
IM 0x9A      → shift left 7, OR → 0x1269_5BC_1A (truncated to 32 bits)

The L1 cache in the Evo allows up to 5 IM instructions to execute in a single cycle, making constant loading very efficient.

Emulation Vectors

A key ZPU feature is instruction emulation. Instructions marked with EMULATE (opcode bits 6:5 = 10) branch to a vector in the range 0x0000-0x0400. Each vector is 32 bytes:

Vector address = instruction[4:0] × 32

Example: EMULATE 12 → branches to address 0x0180 (12 × 32)

If a hardware instruction is not implemented (disabled in configuration), it triggers the EMULATE path, where software microcode implements the operation. This allows the CPU to run the same software regardless of which instructions are in hardware - just at different speeds.

This is what makes the ZPU uniquely flexible: you can trade FPGA resources for performance by selectively enabling/disabling hardware instructions.

Part 2: Project Structure

Directory Layout

ZPU/
├── cpu/                          # CPU core implementations
│   ├── zpu_core_evo.vhd          # ZPU Evolution CPU (3688 lines)
│   ├── zpu_core_small.vhd        # ZPU Small CPU
│   ├── zpu_core_medium.vhd       # ZPU Medium CPU
│   ├── zpu_core_flex.vhd         # ZPU Flex CPU
│   ├── zpu_pkg.vhd               # CPU package (opcodes, config)
│   └── zpu_uart_debug.vhd        # Debug serialiser
│
├── devices/                      # Peripheral IP cores
│   ├── sysbus/                   # System bus peripherals
│   │   ├── BRAM/                 # Boot ROM and RAM templates
│   │   ├── SDRAM/                # SDRAM controller
│   │   ├── uart/                 # UART controllers
│   │   ├── timer/                # Timer/counter
│   │   ├── intr/                 # Interrupt controller
│   │   ├── ps2/                  # PS2 keyboard/mouse
│   │   ├── spi/                  # SPI interface
│   │   ├── SDMMC/                # SD card controller
│   │   └── ioctl/                # MiSTer IOCTL bus
│   └── WishBone/                 # Wishbone bus peripherals
│       ├── I2C/                  # I2C master controller
│       ├── SRAM/                 # Wishbone SRAM
│       └── SDRAM/                # Wishbone SDRAM controller
│
├── build/                        # FPGA board build files
│   ├── Makefile                  # Build targets for all boards/CPUs
│   ├── DE10_nano_zpu.qpf         # Quartus project files
│   ├── DE10_nano_zpu.qsf         # Pin assignments & file lists
│   ├── DE10_nano_zpu_Toplevel.vhd # Board wrapper
│   ├── E115_zpu.qpf/qsf          # E115 board files
│   ├── CYC1000_zpu.qpf/qsf       # CYC1000 board files
│   ├── QMV_zpu.qpf/qsf           # QMTECH Cyclone V files
│   └── DE0_nano_zpu.qpf/qsf      # DE0 Nano board files
│
├── zpu_soc.vhd                   # SoC top level (2346 lines)
├── zpu_soc_pkg.vhd               # SoC configuration package
├── zpu_soc_pkg.tmpl.vhd          # Template for build system
├── VERSION                       # Version number for CI/CD
├── docs/                         # Documentation and images
└── README.md                     # Project documentation

Key Source Files Explained

zpu_pkg.vhd - The CPU-level configuration package. Contains:

  • Opcode definitions (OpCode_Add, OpCode_Load, etc.)
  • Address bus width configuration (maxAddrBit)
  • Component declarations for the CPU cores
  • Debug configuration (levels, UART baud rate)
  • Instruction bus enable flags (EVO_USE_INSN_BUS, EVO_USE_WB_BUS)

zpu_soc_pkg.vhd - The SoC-level configuration. Contains:

  • CPU model selection (only one of SMALL/MEDIUM/FLEX/EVO/EVO_MINIMAL)
  • Board clock frequencies
  • Memory geometry (BRAM size, SDRAM parameters)
  • Peripheral enable/disable flags
  • Cache size parameters
  • Memory map constants (start/end addresses)

zpu_soc_pkg.tmpl.vhd - A template copy of the SoC package with all CPU model flags set to 0. The build system uses sed to enable the desired CPU model, generating zpu_soc_pkg.vhd from this template.

zpu_soc.vhd - The main SoC module. This is where everything comes together:

  • Instantiates the selected CPU core
  • Connects BRAM, SDRAM, UART, timers, SD card, etc.
  • Manages the interrupt controller
  • Routes memory/IO requests to the appropriate device
  • Implements the Wishbone bus bridge (if enabled)

cpu/zpu_core_evo.vhd - The Evo CPU implementation (3688 lines). Contains:

  • Instruction decoder and execution state machine
  • L1 cache (register file)
  • L2 cache (BRAM with address mapping)
  • Memory Transaction Processor (MXP)
  • Stack management
  • Hardware instruction implementations

Part 3: Configuring the SoC

Choosing a CPU Model

Open zpu_soc_pkg.vhd (or edit the template zpu_soc_pkg.tmpl.vhd for builds). Find the CPU selection constants:

-- Choose which CPU to instantiate. Only enable ONE at a time.
constant ZPU_SMALL                :     integer    := 0;
constant ZPU_MEDIUM               :     integer    := 0;
constant ZPU_FLEX                 :     integer    := 0;
constant ZPU_EVO                  :     integer    := 1;  -- ← Currently active
constant ZPU_EVO_MINIMAL          :     integer    := 0;

Set exactly one to 1 and all others to 0. The build system handles this automatically when using the Makefile.

Configuring Memory

Boot BRAM size:

constant SOC_MAX_ADDR_BRAM_BIT    :     integer    := 17;
-- 17 bits = 2^17 = 128KB of BRAM
-- 16 bits = 64KB, 15 bits = 32KB

Increasing BRAM size gives more space for firmware and stack but consumes more of the FPGA’s block RAM. Smaller FPGAs (CYC1000) may need 15 or 16 bits.

SDRAM (optional):

constant SOC_IMPL_SDRAM           :     boolean    := false;  -- Enable SDRAM
constant SOC_SDRAM_ROWS           :     integer    := 4096;
constant SOC_SDRAM_COLUMNS        :     integer    := 256;
constant SOC_SDRAM_BANKS          :     integer    := 4;
constant SOC_SDRAM_DATAWIDTH      :     integer    := 16;     -- 16-bit data bus

Set SOC_IMPL_SDRAM to true to enable the system bus SDRAM controller. Adjust the geometry parameters to match your SDRAM chip’s datasheet.

Enabling Peripherals

Each peripheral is individually selectable:

constant SOC_IMPL_TIMER1          :     boolean    := true;   -- Timer block
constant SOC_IMPL_PS2             :     boolean    := false;  -- PS2 keyboard
constant SOC_IMPL_SPI             :     boolean    := false;  -- SPI interface
constant SOC_IMPL_SD              :     boolean    := true;   -- SD card
constant SOC_IMPL_INTRCTL         :     boolean    := true;   -- Interrupt controller
constant SOC_IMPL_SOCCFG          :     boolean    := true;   -- SoC config registers
constant SOC_IMPL_WB_I2C          :     boolean    := false;  -- I2C (Wishbone)

Disabling unused peripherals saves FPGA resources. For minimal designs, you only need the BRAM, one UART, and possibly the timer.

Tuning the Evo Cache

The Evo CPU’s performance depends heavily on cache configuration:

-- L1 Cache: register-based, fast but uses fabric
constant MAX_EVO_L1CACHE_BITS     :     integer    := 5;
-- 5 bits = 32 instruction cache entries (uses 32 registers)
-- Increase for better IM optimisation, decrease to save fabric

-- L2 Cache: BRAM-based, larger but uses block RAM
constant MAX_EVO_L2CACHE_BITS     :     integer    := 12;
-- 12 bits = 4096 byte L2 cache
-- Increase for fewer SDRAM stalls, decrease to save BRAM

-- Memory Transaction queue depth
constant MAX_EVO_MXCACHE_BITS     :     integer    := 3;
-- 3 bits = 8 pending transactions

Guideline for students: Start with the defaults. If your design doesn’t fit in the FPGA, reduce L2 cache first (it uses BRAM). If you need more performance, try increasing L1 cache (costs fabric logic elements).

Part 4: Building FPGA Bitstreams

Using the Makefile

The simplest way to build is using the Makefile in the build/ directory:

cd ZPU/build

# Build a specific board/CPU combination
make DE10_nano_EVO        # DE10-Nano with Evo CPU
make E115_SMALL           # E115 with Small CPU
make CYC1000_MEDIUM       # CYC1000 with Medium CPU

# Build all variants for a board
make DE10_nano_SMALL DE10_nano_MEDIUM DE10_nano_FLEX DE10_nano_EVO DE10_nano_EVO_MINIMAL

# Build all variants for all boards
make all

What happens during a build:

  1. The Makefile copies zpu_soc_pkg.tmpl.vhd and uses sed to set the chosen CPU model to 1
  2. The generated zpu_soc_pkg.vhd replaces the existing one
  3. Quartus runs synthesis (Analysis & Synthesis) - converts VHDL to a netlist
  4. Quartus runs the Fitter (Place & Route) - maps to actual FPGA resources
  5. Quartus runs the Assembler - generates the .sof programming file
  6. quartus_cpf converts the .sof to .rbf (Raw Binary Format)

Using Quartus Directly

If you prefer the Quartus GUI:

  1. Open the .qpf project file for your board (e.g., DE10_nano_zpu.qpf)
  2. Edit zpu_soc_pkg.vhd to select your CPU model
  3. Click Processing → Start Compilation (or press Ctrl+L)
  4. After compilation, find the .sof in the build directory
  5. Program the FPGA: Tools → Programmer, select your .sof file, click Start

Using Docker (Headless Builds)

For automated or server-based builds without a Quartus installation:

# Build using the Docker container
docker run --rm \
    --mac-address "02:50:dd:72:03:01" \
    -e "LM_LICENSE_FILE=/srv2/license2.dat" \
    -v "/run/udev:/run/udev:ro" \
    -v "/sys:/sys:ro" \
    -v "/path/to/ZPU:/workspace" \
    -w "/workspace/build" \
    quartus-ii-17.1.1 \
    /opt/altera/quartus/bin/quartus_sh --flow compile DE10_nano_zpu

The CI/CD pipeline automates this for all board/CPU combinations.

Programming the FPGA

Using Quartus Programmer (GUI):

  1. Connect your FPGA board via USB-Blaster
  2. Open Quartus Programmer
  3. Click Auto Detect to find the FPGA
  4. Select your .sof file
  5. Click Start to program

Using command-line:

quartus_pgm -m jtag -o "p;DE10_nano_EVO.sof"

Note: .sof programming is volatile - the FPGA loses its configuration on power-off. For permanent programming, convert to .pof or use the board’s flash memory.

Part 5: The Instruction Set In Depth

Core Instructions

These are always available in hardware:

Instruction Opcode Stack Effect Description
NOP 0x0B No operation
ADD 0x05 a b → (a+b) Add top two stack values
AND 0x06 a b → (a&b) Bitwise AND
OR 0x07 a b → (a|b) Bitwise OR
NOT 0x09 a → (~a) Bitwise NOT
FLIP 0x0A a → flip(a) Reverse bit order
LOAD 0x08 addr → value Read 32-bit from memory
STORE 0x0C value addr → Write 32-bit to memory
PUSHSP 0x02 — → SP Push stack pointer
POPSP 0x0D addr → Set stack pointer
POPPC 0x04 addr → Pop address, jump to it (return)
IM 0x80+ — → imm Push immediate (7 bits)
LOADSP 0x60+ — → mem[SP+n] Load from stack offset
STORESP 0x40+ value → Store to stack offset
ADDSP 0x10+ a → a+mem[SP+n] Add stack offset value

Emulatable Instructions

These can be in hardware (fast) or emulated (slow). In the Evo, all are typically enabled in hardware:

Instruction Opcode Stack Effect Description
CALL 0x2D addr → PC+1 Call subroutine
CALLPCREL 0x3F offset → PC+1 PC-relative call
SUB 0x31 a b → (b-a) Subtract
MULT 0x29 a b → (a×b) Multiply
DIV 0x35 a b → (a/b) Signed divide
MOD 0x36 a b → (a%b) Modulo
NEG 0x30 a → (-a) Negate
EQ 0x2E a b → (a==b) Equality test
NEQ 0x2F a b → (a!=b) Inequality test
LOADB 0x33 addr → byte Load byte
STOREB 0x34 val addr → Store byte
LOADH 0x22 addr → half Load 16-bit
STOREH 0x23 val addr → Store 16-bit
ASHIFTLEFT 0x2B val shift → result Arithmetic shift left
ASHIFTRIGHT 0x2C val shift → result Arithmetic shift right
LSHIFTRIGHT 0x2A val shift → result Logical shift right
XOR 0x32 a b → (a^b) Exclusive OR

Extended Instructions (Evo Only)

The Evo supports multi-byte extended instructions using the EXTEND prefix (opcode 0x0F):

Format: EXTEND, <instruction byte>, [parameter bytes]

Instruction byte: [opcode(7:2)][paramSize(1:0)]
  paramSize: 00=none, 01=8-bit, 10=16-bit, 11=32-bit

Current extended instructions:

  • ESR (Extended Status Register) - Read background transfer status
  • LDIR (Load Increment Repeat) - Block memory copy with optional background execution

How an Instruction Executes (Evo CPU)

Understanding the execution flow helps when adding instructions. Here is how a typical instruction flows through the Evo:

  1. Fetch: The L1 cache provides the next instruction byte. If the L1 cache misses, it is filled from L2 cache or main memory.
  2. Decode: The instruction decoder examines the opcode and determines the operation.
  3. Execute: For simple operations (ADD, AND, etc.), the result is computed combinationally in one cycle. For memory operations, a request is submitted to the Memory Transaction Processor (MXP).
  4. Writeback: The result is pushed onto the stack (or the stack pointer is adjusted).

For the Evo, many operations complete in a single cycle when data is in the L1/L2 cache.

Part 6: Adding New Hardware Instructions

This is one of the most exciting parts of working with the ZPU - you can add your own custom instructions to the CPU hardware.

Step 1: Choose an Opcode

Look at the emulation vector space (opcodes with bits 6:5 = 10, i.e., 0x20-0x3F). Some are already used, but there are free slots. Alternatively, use the EXTEND mechanism for new instructions.

For a simple example, let us add a SWAP instruction that swaps TOS and NOS:

SWAP: opcode 0x20 (EMULATE 0, currently unused in many builds)
Stack: a b → b a

Step 2: Add the Opcode Constant

In cpu/zpu_pkg.vhd, add your opcode:

constant OpCode_Swap     : std_logic_vector(5 downto 0) :=
    std_logic_vector(to_unsigned(32, 6));  -- 0x20

Step 3: Add a Configuration Toggle

In zpu_soc_pkg.vhd, add an enable/disable constant:

constant IMPL_EVO_SWAP            :     boolean    := true;

And pass it through the generic map in zpu_soc.vhd:

IMPL_SWAP            => IMPL_EVO_SWAP,

Step 4: Add the Generic to the CPU Entity

In cpu/zpu_core_evo.vhd, add the generic parameter:

generic (
    ...
    IMPL_SWAP                 : boolean := false;
    ...
);

Step 5: Implement the Instruction

In the instruction decoder section of zpu_core_evo.vhd, find where EMULATE instructions are handled and add:

-- SWAP instruction: swap TOS and NOS
if IMPL_SWAP = true and insnExec(5 downto 0) = OpCode_Swap then
    -- Read NOS from stack
    stackA <= stackB;  -- TOS gets old NOS
    stackB <= stackA;  -- NOS gets old TOS
    -- No memory access needed, no PC change
else
    -- ... existing EMULATE handling (branch to vector)

The exact implementation depends on where in the state machine you are inserting and how the stack is managed. Study the existing instruction implementations (e.g., OpCode_Add, OpCode_Sub) to understand the patterns.

Step 6: Test Your Instruction

Write a test program in C using inline assembly:

#include <stdio.h>

static inline void zpu_swap(void) {
    __asm__ volatile (".byte 0x20");  // SWAP opcode
}

int main() {
    int a = 42, b = 99;
    // Push a and b onto stack, then SWAP
    printf("Before: a=%d, b=%d\n", a, b);
    // ... test the instruction
    return 0;
}

Using the Extended Instruction Mechanism

For more complex instructions that need parameters, use the EXTEND prefix:

EXTEND (0x0F), InstructionByte, [ParamBytes]

InstructionByte = [opcode(7:2)][paramSize(1:0)]

The L1 cache pre-fetches the parameter bytes so they are available in the same cycle as the instruction decode. This allows multi-byte instructions to execute efficiently.

Part 7: The Software Ecosystem — IOCP, zOS and Applications

Once you have built an FPGA bitstream, your ZPU needs software to run. The ZPU Evolution project provides a complete software ecosystem arranged in layers: a bootloader (IOCP) that initialises the hardware and loads an operating system (zOS), which in turn provides a shell and command-line environment from which you can run applications — much like CP/M or early DOS. Understanding these layers is essential — even if you plan to write your own software from scratch, you will need at minimum IOCP to bootstrap the CPU, and zOS to provide the runtime environment.

Overview of the Software Layers

The standard software stack on a ZPU Evolution system is:

┌───────────────────────────────────────────────────┐
│              Applications                          │
│  (ed, kilo, tbasic, mbasic, benchmarks,           │
│   your own programs — loaded from SD card)         │
├───────────────────────────────────────────────────┤
│         zOS — Operating System                     │
│  (Shell, file system, 80+ commands, app loading,   │
│   memory management, interrupt handling)            │
├───────────────────────────────────────────────────┤
│         IOCP — Bootloader                          │
│  (Hardware init, SD card boot, serial upload,       │
│   memory monitor — embedded in BRAM)                │
├───────────────────────────────────────────────────┤
│         ZPU Hardware (FPGA)                         │
│  (CPU, BRAM, SDRAM, UART, SD, Timer, Interrupts)   │
└───────────────────────────────────────────────────┘

The boot sequence is: FPGA powers on → IOCP runs from BRAM → IOCP loads zOS from SD card → zOS presents a shell → user runs applications from the command line.

You can also embed zOS directly into BRAM (standalone mode), eliminating the need for IOCP at the cost of a longer FPGA recompile each time you update the OS.

Note on ZPUTA: During the development of ZPU Evolution, a test application called ZPUTA (ZPU Test Application) was created first to aid in hardware testing and validation. ZPUTA was then used as the template to write zOS. ZPUTA is largely redundant now that zOS exists — zOS is the software you should build and use. ZPUTA remains in the repository for historical reference and for anyone needing the original hardware test harness, but it is not covered further in this guide. All commands and features available in ZPUTA are also available in zOS.

IOCP — The Bootloader

IOCP (I/O Control Program) is the first code that executes when the ZPU powers on. It is embedded directly in the FPGA’s Block RAM during synthesis, so it is always available — no SD card or external storage is needed for it to run.

What IOCP does:

  1. Initialises the hardware — configures UARTs at 115200 baud, enables RX/TX FIFOs, sets up the interrupt controller and timer
  2. Attempts to mount the SD card — using the Petit FatFS library (a minimal read-only FAT implementation)
  3. Waits for user input — if you press a key within ~5 seconds, IOCP enters its interactive command monitor
  4. Auto-boots — if no key is pressed, loads BOOT.ROM (or BOOTTINY.ROM for tiny IOCP) from the SD card root directory and jumps to it

IOCP functionality levels:

IOCP is compiled with a FUNCTIONALITY parameter that controls its size:

Level Name Size Features
0 Full ~40 KB All commands, binary upload, config info, interrupt timer
1 Medium ~20 KB Command processor, timer, auto-boot, SD directory
2 Minimum ~10 KB Version display, interrupt handler, auto-boot
3 Tiny ~3–5 KB Bootstrap only, no interactive UI

For students, Level 0 (Full) is the best starting point as it gives you interactive memory inspection and serial upload capabilities. For production designs where BRAM is scarce, Level 3 (Tiny) leaves maximum space for the application.

IOCP interactive commands (Full mode):

Command Description
0 Execute application in Boot BRAM
1 Execute application in RAM
2 Upload application to BRAM via serial (with CRC-32 validation)
3 Upload application to RAM via serial
4 Dump BRAM memory (hex + ASCII)
5 Dump Stack memory
6 Dump RAM memory
d List SD card directory
C Clear BRAM application area
c Clear RAM
R Reset system
i Show SoC configuration and version info
h Help

Serial upload protocol:

IOCP can receive new firmware over the UART without re-synthesising the FPGA. This is invaluable during development:

  1. Send the magic sequence: I, O, C, P
  2. Send image size (4 bytes, little-endian)
  3. Send CRC-32 of the image (4 bytes, inverted)
  4. Send the image data in 4-byte words
  5. IOCP validates the CRC and reports success or failure

IOCP memory layout:

0x00000 ┌──────────────────┐
        │ IOCP Boot Code   │  Fixed vectors, startup (0x400 bytes)
0x00400 ├──────────────────┤
        │ IOCP Main Code   │  Command processor, SD card, UART drivers
        │ & Read-Only Data │
0x01000 ├──────────────────┤  ← IOCP_APPADDR (applications load here)
        │ Application      │
        │ (zOS)            │
        │                  │
        ├──────────────────┤  ← Top of BRAM minus stack
        │ Stack            │  (grows downward, 512–2048 bytes)
0x07FFF └──────────────────┘  (example for 32 KB BRAM)

IOCP remains memory-resident after loading an application, occupying the first 0x1000 bytes. The loaded application starts at IOCP_APPADDR (0x01000) and has the rest of BRAM available.

zOS — The Operating System

zOS (ZPU Operating System) is the standard operating system for the ZPU Evolution. It provides a command-line shell, full file system support, memory management, and the ability to load and run applications from SD card — analogous to how CP/M or early DOS provided a command-line environment for running programs. When you build a ZPU system, zOS is the software you should use.

zOS shell features:

  • 80+ commands (file system, memory, hardware, execution)
  • Readline-style line editing with history (saved to SD card)
  • AUTOEXEC.BAT support — place a file named AUTOEXEC.BAT in the SD card root and zOS will execute its commands on boot, just like MS-DOS
  • Help system — help shows all commands, help <group> shows a category, help <cmd> shows detailed usage

zOS command categories:

Category Example Commands Description
File System fdir, fcat, fcp, fdel, fload, fexec, fmkdir, fcd Full file and directory operations (30+ commands)
Disk I/O dinit, dstat, ddump, dioctl Low-level SD card access
Disk Buffer bdump, bedit, bread, bwrite, bfill 512-byte sector buffer manipulation
Memory mdump, mcopy, mdiff, mtest, mperf, msrch Inspect, test, and benchmark memory
Memory Edit meb, meh, mew Edit memory as bytes, halfwords, or words
Hardware hr, ht, hie, hid Register display, timer test, interrupt control
Benchmarks dhry, coremark CPU performance measurement (Dhrystone v2.1, CoreMark v1.0)
Execution call <addr>, jmp <addr> Execute code at arbitrary addresses
Applications ed, kilo, tbasic, mbasic Editors and BASIC interpreters
System restart, reset, help, info, time System management

Readline key bindings:

Key Action
CTRL-A Move to start of line
CTRL-E Move to end of line
CTRL-K Clear line
CTRL-P / Arrow Up Recall previous command
CTRL-N / Arrow Down Recall next command
CTRL-C Abort current line
!<number> Re-execute a historised command
hist List command history

How zOS loads and runs applications:

When you type a command that is not built-in, zOS searches for a matching binary on the SD card:

  1. Searches the /bin/ directory for <command>.ZPU (the .ZPU extension matches the CPU architecture)
  2. Loads the binary into memory at the configured application load address (default 0x0C000 or 0x100000)
  3. Calls the application’s entry point, passing command-line arguments and system structures
  4. The application executes, using zOS API functions for I/O and file access
  5. When the application returns, control passes back to the zOS shell
User types: kilo myfile.txt
           │
           ▼
    ┌──────────────┐     ┌───────────────┐     ┌──────────────┐
    │ Command not  │────►│ Search SD for │────►│ Load kilo.ZPU│
    │ built-in     │     │ bin/kilo.ZPU  │     │ at 0x0C000   │
    └──────────────┘     └───────────────┘     └──────┬───────┘
                                                       │
                                                       ▼
                                               ┌──────────────┐
                                               │ app("myfile  │
                                               │  .txt", 0)   │
                                               │              │
                                               │ returns → 0  │
                                               └──────┬───────┘
                                                       │
                                                       ▼
                                               ┌──────────────┐
                                               │ Back to zOS  │
                                               │ shell prompt │
                                               └──────────────┘

The application entry point:

Every ZPU application uses a standard entry point signature:

uint32_t app(uint32_t param1, uint32_t param2)
Parameter Contents
param1 Pointer to command-line arguments as a C string (char *)
param2 Reserved (typically 0)
Return value 0 = success, 0xFFFFFFFF = failure, other = detailed error code

The application also receives pointers to the zOS global structures (file handles, FatFS objects, disk buffers) and the SoC configuration structure (memory sizes, peripheral flags, clock frequencies). These allow applications to use the OS’s file system and detect hardware capabilities.

zOS API — the vector table:

zOS exposes 92 API functions through a fixed vector table. Applications call these functions instead of reimplementing I/O, which keeps application binaries small and ensures correct hardware access. Key API categories:

Category Functions Purpose
Character I/O putchar(), puts(), getserial() Serial terminal output and input
Formatted I/O printf(), sprintf(), xatoi() Formatted printing and number parsing
File System f_open(), f_read(), f_write(), f_close(), f_lseek(), etc. Full FatFS API (24 functions)
Disk I/O disk_read(), disk_write(), disk_ioctl() Low-level SD card access
Memory malloc(), realloc(), calloc(), free() Dynamic memory allocation (umm_malloc)
Parameters getStrParam(), getUintParam() Parse command-line arguments
System rtcSet(), rtcGet(), crc32_init(), crc32_addword() RTC and CRC utilities

zOS memory layout:

0x00000 ┌──────────────────────┐
        │ IOCP (if used)       │  Optional bootloader (0x1000 bytes)
0x01000 ├──────────────────────┤  ← OS_BASEADDR (if IOCP used)
        │ zOS Kernel           │
        │ (shell, commands,    │  35–100 KB depending on configuration
        │  file system, APIs)  │
        ├──────────────────────┤
        │ OS Heap              │  umm_malloc managed (default 0x8000)
0x0C000 ├──────────────────────┤  ← APP_LOAD_ADDR
        │ Application Code     │
        │ (loaded from SD)     │  Up to ~0x70000 bytes
        │                      │
        ├──────────────────────┤
        │ Stack                │  Grows downward (default 0x3D80)
        └──────────────────────┘  Top of BRAM/SDRAM

Applications

The ZPU software ecosystem includes several ready-to-use applications that demonstrate the platform’s capabilities and provide useful tools. All applications are stored as binary files on the SD card in the /bin/ directory with a .ZPU extension.

Included applications:

Application Description Size
ed Basic VT100 text editor — navigate with arrow keys, CTRL-S to save, CTRL-Q to quit ~37 KB source
kilo Advanced VT100 WYSIWYG editor — syntax highlighting, search, more features than ed ~50 KB source
tbasic Tiny BASIC interpreter — write and run BASIC programs interactively ~154 KB compiled
mbasic Mini BASIC v1.0 — a second BASIC dialect with a built-in editor ~100 KB compiled
dhry Dhrystone v2.1 benchmark — measures CPU integer performance in DMIPS Built-in or applet
coremark CoreMark v1.0 benchmark — industry-standard embedded CPU benchmark Built-in or applet

File system commands (fdir, fcat, fcp, fdel, etc.), memory commands (mdump, mtest, mperf, etc.), and hardware commands (hr, ht) can also be compiled as external SD card applets rather than built into the OS, saving BRAM space.

Writing your own application:

Creating a new ZPU application is straightforward. Here is a complete example:

Step 1: Create the application directory and source file

zOS/apps/myapp/
├── myapp.c
└── Makefile

Step 2: Write the application code (myapp.c):

#if defined(__ZPU__)
  #include <zstdio.h>
  #include "zpu_soc.h"
#elif defined(__K64F__)
  #include <stdio.h>
  #include "k64f_soc.h"
#endif

#include "ff.h"
#include "xprintf.h"
#include "zOS_app.h"

uint32_t app(uint32_t param1, uint32_t param2)
{
    char *ptr = (char *)param1;  // Command-line arguments
    long value;

    xprintf("Hello from my ZPU application!\n");
    xprintf("Arguments: %s\n", ptr);

    // Parse a numeric argument if provided
    if (xatoi(&ptr, &value)) {
        xprintf("Numeric argument: %ld (0x%08lX)\n", value, value);
    }

    // Access the file system
    FIL file;
    FRESULT res = f_open(&file, "0:\\test.txt", FA_CREATE_ALWAYS | FA_WRITE);
    if (res == FR_OK) {
        f_puts("Written from my ZPU app!\n", &file);
        f_close(&file);
        xprintf("File written successfully.\n");
    }

    return 0;  // Success
}

Step 3: Create the Makefile:

APP_NAME       = myapp
APP_DIR        = $(CURDIR)/..
BASEDIR        = ../../..

APP_C_SRC      =
CFLAGS         =
CPPFLAGS       =
LDFLAGS        = -nostdlib

ifeq ($(__K64F__),1)
include        $(APP_DIR)/Makefile.k64f
else
include        $(APP_DIR)/Makefile.zpu
endif

Step 4: Build and deploy:

cd zOS/apps/myapp
make __ZPU__=1              # Build for ZPU

# Copy binary to SD card
cp myapp.ZPU /path/to/sdcard/bin/

Step 5: Run from the zOS shell:

* myapp 42
Hello from my ZPU application!
Arguments: 42
Numeric argument: 42 (0x0000002A)
File written successfully.
*

Application API access:

Your application does not need to implement its own UART driver, file system, or memory allocator. All of these are provided by the zOS kernel through the vector table. When your application calls xprintf(), it uses the kernel’s UART driver. When it calls f_open(), it uses the kernel’s FatFS instance and SD card driver. This is similar to how a CP/M transient program calls BDOS functions — the OS provides the services, and your application focuses on its own logic.

Building the Software

All ZPU software (IOCP, zOS, and applications) is built from the zOS repository using a unified build.sh script.

# Clone the repository
git clone https://git.eaw.app/eaw/zOS.git
cd zOS

Build commands for each component:

# Build IOCP (Tiny, for Evo CPU, 128 KB BRAM)
./build.sh -C Evo -I 3 -o 0 -M 0x1FD80 -B 0x0000

# Build zOS (standalone, for Evo CPU)
./build.sh -C Evo -O zos -o 0 -M 0x1FD80 -B 0x0000 \
    -S 0x3D80 -N 0x8000 -A 0x100000 -a 0x70000

# Build zOS (loaded by IOCP, base at 0x1000)
./build.sh -C Evo -O zos -o 2 -M 0x1FD80 -B 0x1000 \
    -S 0x3D80 -N 0x8000 -A 0x0C000 -a 0x70000

Build parameter reference:

Flag Purpose Example Values
-C Target CPU model Small, Medium, Flex, Evo, EvoMin, K64F
-O Operating system zos (or zputa for legacy test application)
-I IOCP functionality level 0 (Full), 1 (Medium), 2 (Minimum), 3 (Tiny)
-o Boot mode 0 (standalone), 1 (app with IOCP), 2 (app with Tiny IOCP), 3 (RAM)
-M Maximum BRAM size 0x8000 (32 KB), 0x10000 (64 KB), 0x1FD80 (128 KB)
-B OS base address 0x0000 (standalone), 0x1000 (after IOCP)
-S Stack size 0x3D80
-N OS heap size 0x8000
-A Application load address 0x0C000, 0x100000
-a Maximum application size 0x70000
-n Application heap size 0x0000 (shared with OS)
-s Application stack size 0x0000 (shared with OS)
-T Enable tranZPUter mode (flag, no value)
-d Debug build (flag, no value)

Build outputs:

The build produces several output files:

File Description
main.bin Raw binary image
main.hex Intel HEX format
main.srec Motorola S-record format
main.elf ELF with debug symbols (for debuggers)
main.lss Linker map and memory layout summary
rtl/TZSW_*.vhd VHDL BRAM initialisation files (for FPGA synthesis)
build/SD/bin/*.ZPU Application binaries for SD card

The VHDL files (e.g. TZSW_DualPortBootBRAM.vhd) are the key output for FPGA integration — they contain the compiled firmware as BRAM initialisation data. When you run Quartus to compile the FPGA design, these files are included in the synthesis, embedding the firmware directly into the bitstream.

How Firmware Gets Into the FPGA

The ZPU boots from Block RAM (BRAM) inside the FPGA. The firmware must be embedded into the BRAM during FPGA compilation:

  1. C source files are compiled with zpu-elf-gcc → ELF binary
  2. The ELF binary is converted to VHDL BRAM initialisation (via the zpugen tool) or a Memory Initialisation File (.mif)
  3. The BRAM VHDL entity includes the initialisation data as pre-loaded content
  4. When Quartus compiles the design, it synthesises the BRAM with the firmware pre-loaded
  5. On power-up, the CPU starts fetching instructions from BRAM address 0

For development, there are two ways to iterate without re-synthesising the FPGA:

  1. Serial upload via IOCP — use the serial upload protocol to send new firmware directly to BRAM through the UART. This takes seconds compared to minutes for a full Quartus recompile.
  2. SD card boot — place your compiled binary as BOOT.ROM on the SD card. IOCP will load it on each power cycle, so you just need to update the file on the SD card.

Memory-Mapped I/O

Software accesses hardware peripherals through memory-mapped registers. The SoC configuration registers (SOCCFG) allow software to detect the hardware configuration at runtime:

// Read SoC configuration
volatile unsigned int *soccfg = (unsigned int *)0xFF0000;
unsigned int cpu_id = soccfg[0];      // CPU model and revision
unsigned int soc_config = soccfg[1];  // Enabled peripherals bitmap

The UART, timer, SD card, and other peripherals each have their own memory-mapped register blocks within the I/O region. The zpu_soc.h header file defines all register addresses and bit fields — include this header in your application code to access the hardware correctly.

Typical Development Workflow

Here is a recommended workflow for a student starting from scratch:

  1. Build the FPGA bitstream with IOCP embedded (Part 4)
  2. Program the FPGA and connect a serial terminal at 115200 baud
  3. Interact with IOCP — use i to see the SoC configuration, 4 to dump BRAM, verify the hardware is working
  4. Build zOS using build.sh and copy BOOT.ROM plus the bin/ directory to a FAT32-formatted SD card
  5. Insert the SD card and reset the board — IOCP loads zOS automatically
  6. Explore zOS — try help, fdir, mdump 0 100, mtest, dhry
  7. Run applications — try ed myfile.txt or tbasic to see the editors and BASIC interpreter
  8. Write your own application — create a new directory under zOS/apps/, write your app() function, build, copy to the SD card, and run it from the zOS shell
  9. Iterate — modify your application, rebuild, copy to SD card, and run again. No FPGA recompile needed.

This workflow lets you start producing working software in minutes, without waiting for lengthy FPGA compilations except during the initial hardware setup.

Part 8: Writing Custom Software

This section covers the lower-level details of compiling and linking software for the ZPU, for those who want to go beyond using the provided build scripts or who want to understand what happens under the hood.

The ZPU GCC Toolchain

The ZPU uses a GCC cross-compiler: zpu-elf-gcc. This is a standard GCC port that produces ZPU machine code. The toolchain includes:

Tool Purpose
zpu-elf-gcc C/C++ compiler
zpu-elf-as Assembler
zpu-elf-ld Linker
zpu-elf-objcopy Binary format conversion (ELF → HEX, BIN, SREC)
zpu-elf-objdump Disassembler and ELF inspection
zpu-elf-size Section size reporting

Important compiler flags for the ZPU Evo:

# Enable hardware instructions (Evo/EvoMin only)
zpu-elf-gcc -mloadsp -mstoresp -mpushspadd -mneqbranch -maddsp \
            -mmult -mdiv -mmod -mneg \
            -Os -o myapp.elf myapp.c

The -m flags tell the compiler to use hardware instructions rather than emulation. For Small/Medium/Flex CPUs, omit the arithmetic flags (-mmult, -mdiv, etc.) as those instructions are emulated in software.

Linker Scripts and Memory Layout

The linker script (.ld file) defines where code and data are placed in memory. A typical ZPU linker script specifies:

MEMORY {
    BOOT  (rx)  : ORIGIN = 0x00000000, LENGTH = 0x00000400
    CODE  (rwx) : ORIGIN = 0x00000400, LENGTH = 0x0000FC00
}

SECTIONS {
    .fixed_vectors : { *(.fixed_vectors) } > BOOT
    .text          : { *(.text*) }         > CODE
    .rodata        : { *(.rodata*) }       > CODE
    .data          : { *(.data*) }         > CODE
    .bss           : { *(.bss*) }          > CODE
}

The startup code (romcrt0.s) sets up the interrupt vector at address 0, initialises the stack pointer, clears the .bss section, and jumps to main() (for IOCP/zOS) or app() (for applications).

Inline Assembly

You can embed ZPU instructions directly in C code using GCC inline assembly. This is useful for accessing hardware instructions not generated by the compiler:

// Push a value onto the ZPU stack and execute a custom opcode
static inline void zpu_nop(void) {
    __asm__ __volatile__("nop");
}

// Read the stack pointer
static inline unsigned int zpu_get_sp(void) {
    unsigned int sp;
    __asm__ __volatile__("pushsp\n\tload\n\t" : "=r"(sp));
    return sp;
}

Part 9: Understanding the SoC Architecture

Block Diagram

                    ┌─────────────────────────────────────────┐
                    │              ZPU SoC                     │
                    │                                         │
                    │  ┌──────────┐    ┌────────────────┐     │
                    │  │          │    │  Boot BRAM      │     │
                    │  │   CPU    │◄──►│  (Dual-Port)    │     │
                    │  │ (Small/  │    │  Port A: Data   │     │
 UART TX/RX ◄──────┤  │  Medium/ │    │  Port B: Insn   │     │
                    │  │  Flex/   │    └────────────────┘     │
 SD Card SPI ◄──────┤  │  Evo/   │           │               │
                    │  │  EvoMin) │    ┌──────┴──────┐        │
 GPIO/LEDs ◄────────┤  │          │    │ System Bus   │        │
                    │  └────┬─────┘    └──────┬──────┘        │
                    │       │                 │               │
                    │  ┌────┴─────────────────┴────────┐      │
                    │  │        Bus Decoder             │      │
                    │  └─┬───┬───┬───┬───┬───┬───┬────┘      │
                    │    │   │   │   │   │   │   │            │
                    │  ┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴┐┌─┴──┐       │
                    │  │U0││U1││T1││SD││PS││SP││INTR│       │
                    │  │  ││  ││  ││  ││2 ││I ││CTRL│       │
                    │  └──┘└──┘└──┘└──┘└──┘└──┘└────┘       │
                    │                                         │
                    │  ┌────────────────────────────────┐     │
                    │  │     Wishbone Bus (optional)     │     │
                    │  └─┬────────┬────────┬───────────┘     │
                    │    │        │        │                   │
                    │  ┌─┴──┐  ┌─┴──┐  ┌─┴──────┐           │
                    │  │I2C │  │SRAM│  │WB SDRAM│           │
                    │  └────┘  └────┘  └────────┘           │
                    └─────────────────────────────────────────┘

The Interrupt Controller

The interrupt controller supports up to 16 prioritised interrupt sources (SOC_INTR_MAX). Each source can be individually enabled/disabled via the interrupt enable register:

// Enable timer interrupt (source 0)
volatile unsigned int *intr = (unsigned int *)INTR_BASE;
intr[INTR_ENABLE] = (1 << 0);  // Enable source 0

// In interrupt handler:
unsigned int status = intr[INTR_STATUS];  // Read pending interrupts
intr[INTR_STATUS] = status;               // Clear handled interrupts

Adding a New Peripheral

To add a new peripheral to the SoC:

  1. Write the VHDL module with a memory-mapped register interface
  2. Add it to zpu_soc.vhd within the appropriate bus (system or Wishbone)
  3. Add address decoding for your peripheral’s register space
  4. Add a configuration flag in zpu_soc_pkg.vhd (SOC_IMPL_MYDEVICE)
  5. Add the VHDL file to all relevant .qsf project files
  6. Write a C driver with the register addresses and access functions

Part 10: Board-Specific Setup

Supported Development Boards

Board FPGA Logic Elements BRAM Clock Price Range
DE10 Nano Cyclone V 5CSEBA6U23I7 110K ALMs 5.5Mbit 50MHz (PLL to 100MHz) ~$130
E115 Cyclone IV E EP4CE115F23I7 114K LEs 3.9Mbit 50MHz (PLL to 75MHz) ~$50
CYC1000 Cyclone 10 LP 10CL025YU256C8G 25K LEs 594Kbit 12MHz (PLL to 100MHz) ~$30
QMV Cyclone V 5CEFA2F23C8 25K ALMs 1.8Mbit 50MHz (PLL to 75MHz) ~$50
DE0 Nano Cyclone V 5CSEMA4U23C6 40K ALMs 2.5Mbit 50MHz (PLL to 100MHz) ~$80

Setting Up a New Board

To port the ZPU SoC to a new FPGA board:

  1. Create pin assignments - Map FPGA pins to board peripherals (UART, LEDs, SD card, SDRAM, etc.) in a new .qsf file
  2. Create a top-level wrapper - A <board>_Toplevel.vhd that instantiates the PLL (clock generation) and connects board I/O to the zpu_soc entity
  3. Add clock frequency - Define SYSCLK_<BOARD>_FREQ in zpu_soc_pkg.vhd
  4. Create a Quartus project - New .qpf file referencing all source files
  5. Add Makefile targets - Add board-specific build targets

Example top-level structure:

entity MyBoard_zpu_Toplevel is
    port (
        CLOCK_50    : in    std_logic;        -- 50MHz input clock
        UART_TX     : out   std_logic;        -- UART transmit
        UART_RX     : in    std_logic;        -- UART receive
        LED         : out   std_logic_vector(3 downto 0);
        SD_CS       : out   std_logic;        -- SD card chip select
        SD_CLK      : out   std_logic;        -- SD card clock
        SD_MOSI     : out   std_logic;        -- SD card data out
        SD_MISO     : in    std_logic         -- SD card data in
    );
end entity;

architecture rtl of MyBoard_zpu_Toplevel is
    signal sysclk : std_logic;  -- Generated system clock
begin
    -- PLL: 50MHz → 100MHz
    PLL0 : entity work.pll port map (
        inclk0 => CLOCK_50, c0 => sysclk
    );

    -- Instantiate the ZPU SoC
    SOC0 : entity work.zpu_soc port map (
        sysclk => sysclk,
        -- Connect UART, SD, LEDs, etc.
    );
end architecture;

Part 11: CI/CD and Automated Builds

Pipeline Overview

The ZPU Evolution uses Jenkins for continuous integration. Every push to a monitored branch triggers an automated build that:

  1. Checks out the latest source code
  2. Builds zOS firmware (if the zOS builder Docker image is available)
  3. Compiles all 25 FPGA variants (5 boards x 5 CPU models)
  4. Packages release tarballs
  5. Creates a tagged release on Gitea with all artifacts

Docker Build Environment

Builds run inside Docker containers for reproducibility:

  • quartus-ii-17.1.1 - Contains Intel Quartus Prime 17.1.1 Standard Edition for FPGA compilation
  • zos-builder - Contains the ZPU GCC toolchain for firmware compilation

The Quartus container requires specific volume mounts for license verification:

docker run --rm \
    --mac-address "02:50:dd:72:03:01" \        # License MAC
    -e "LM_LICENSE_FILE=/srv2/license2.dat" \   # License file
    -v "/run/udev:/run/udev:ro" \               # FlexLM device scan
    -v "/sys:/sys:ro" \                         # System info
    -v "$PWD:/workspace" \                      # Project files
    -w "/workspace/build" \
    quartus-ii-17.1.1 \
    /opt/altera/quartus/bin/quartus_sh --flow compile DE10_nano_zpu

Release Artifacts

Successful builds produce:

  • .sof files - JTAG programming files (volatile, for development)
  • .rbf files - Raw binary files (for configuration devices, compressed)
  • Tarballs - Per-board and complete packages uploaded to Gitea releases

Setting Up Your Own CI/CD

To replicate this pipeline:

  1. Install Jenkins with the Generic Webhook Trigger plugin
  2. Build or obtain the Quartus Docker image
  3. Deploy the pipeline script from /var/jenkins_home/pipeline-scripts/zpu-build.groovy
  4. Configure a Gitea webhook pointing to Jenkins
  5. Ensure the FlexLM license is accessible to the Docker containers

Part 12: Debugging and Troubleshooting

Hardware Debug Serialiser

The Evo CPU includes a built-in debug serialiser that outputs CPU state over UART1. Enable it in zpu_pkg.vhd:

constant DEBUG_CPU            : boolean := true;
constant DEBUG_LEVEL          : integer := 2;    -- 0=basic, 5=everything
constant DEBUG_TX_BAUD_RATE   : integer := 115200;

Connect a terminal to UART1 TX to see:

  • Current PC, stack pointer, TOS, NOS
  • Executing instruction and signals
  • L1 and L2 cache contents (at higher debug levels)
  • Breakpoint events

Using SignalTap

For deeper FPGA-level debugging, use Intel’s SignalTap logic analyser:

  1. In Quartus, go to Tools → SignalTap Logic Analyzer
  2. Add signals of interest (e.g., CPU state machine, bus transactions)
  3. Set trigger conditions
  4. Compile and program the FPGA
  5. Capture and analyse waveforms

Common Issues

“Segment Violation” in Docker builds:

  • Ensure /run/udev and /sys are mounted read-only in the Docker container
  • The FlexLM license manager needs these to enumerate network devices

Compilation succeeds but CPU does not boot:

  • Check that the BRAM initialisation file matches the CPU model
  • Verify reset address (SOC_RESET_ADDR_CPU) points to valid code
  • Ensure the PLL is generating the correct clock frequency

“Entity not found” errors:

  • Check that all required VHDL files are listed in the .qsf project file
  • The zOS_DualPort3264BootBRAM.vhd is needed for EVO/EVO_MINIMAL models

Timing failures:

  • Reduce clock frequency or add pipeline registers
  • Check the TimeQuest timing report for the critical path
  • Consider reducing cache sizes to simplify routing

Part 13: Further Reading and Resources

ZPU Resources

Resource URL
ZPU Evolution Repository git.eaw.app/eaw/ZPU
Original Zylin ZPU github.com/zylin/zpu
ZPU GCC Toolchain github.com/zylin/zpugcc
ZPU Flex github.com/robinsonb5/ZPUFlex
ZPU Wikipedia en.wikipedia.org/wiki/ZPU
zOS Repository git.eaw.app/eaw/zOS

Learning Resources

Topic Recommended
VHDL for beginners “Free Range VHDL” (free e-book)
FPGA development Intel FPGA University Program materials
Stack machine theory “Stack Computers: the new wave” by Philip Koopman
Quartus Prime Intel Quartus Prime Handbook
Wishbone bus Wishbone B4 specification (opencores.org)

Exercises for Students

  1. Basic: Build the ZPU Small for your board. Connect a terminal at 115200 baud and interact with IOCP.
  2. Intermediate: Modify zpu_soc_pkg.vhd to change the BRAM size. Observe the effect on available stack space.
  3. Intermediate: Enable the PS2 controller and connect a keyboard. Write a C program that echoes key presses.
  4. Advanced: Add a hardware SWAP instruction (as described in Part 6). Verify it works with inline assembly.
  5. Advanced: Port the design to a new FPGA board not currently supported.
  6. Expert: Implement a new extended instruction using the EXTEND mechanism. For example, a hardware string compare.