Sharp MZ-80A Video Module — Developer's Guide

Video Module Developer's Guide

This guide is a detailed walkthrough of the Sharp MZ-80A Video Module hardware and HDL design. It explains VHDL concepts for developers who may not be familiar with hardware description languages, walks through every source file in the design, documents the CPLD and FPGA architectures in detail, and shows how to build bitstreams, manage CG-ROM images, add new video modes, and modify the CPLD interface for new hardware.
The Video Module exists in two generations. The discrete hardware design (v1.0 and v1.1) uses 74-series TTL and CMOS ICs on PCB; these versions are documented primarily through the KiCad schematics in the schematics/ directory. The FPGA/CPLD design (v2.0) replaces most of the discrete logic with a Cyclone III FPGA and a MAX 7000A CPLD and is described in detail throughout this guide.

Introduction to VHDL for Non-HDL Developers

The entire v2.0 Video Module firmware is written in VHDL (VHSIC Hardware Description Language) — the language used to describe the internal structure and behaviour of programmable logic devices. Unlike C or assembly language, which describe a sequence of instructions executed one at a time by a processor, VHDL describes hardware: wires, registers, and logic gates that all operate simultaneously. This distinction is fundamental and affects every aspect of how VHDL is written and read.

ENTITY and ARCHITECTURE
Every VHDL design unit consists of two parts: an ENTITY and an ARCHITECTURE.
The ENTITY declares the interface — the set of input, output, and bidirectional pins that the outside world connects to. It is analogous to a function signature or a module header. The ARCHITECTURE describes the internal behaviour — what happens between the input and output pins. It is analogous to the function body, except that everything in it executes concurrently rather than sequentially.
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;

ENTITY mux2to1 IS
    PORT (
        sel   : IN  std_logic;
        a, b  : IN  std_logic;
        y     : OUT std_logic
    );
END ENTITY;

ARCHITECTURE rtl OF mux2to1 IS
BEGIN
    -- Concurrent signal assignment: y is continuously driven based on sel.
    -- This is not a statement that executes once — it is a permanent wire connection.
    y <= a WHEN sel = '0' ELSE b;
END ARCHITECTURE;
The WHEN/ELSE assignment above describes a 2:1 multiplexer. The synthesis tool turns this into actual gates in silicon. There is no clock, no loop, no sequence — the output y follows sel continuously, with only propagation delay.

Signals and Types
std_logic is the primary signal type in IEEE VHDL. It models a single wire and can take the following values:
Value Meaning
'0' Logic low
'1' Logic high
'Z' High impedance (tri-state — the wire is disconnected)
'-' Don’t-care (synthesis tool may choose either value for optimisation)
'U' Uninitialised (simulation only — appears when a signal has never been driven)
std_logic_vector is an array of std_logic bits and is used for buses. For example, SIGNAL addr : std_logic_vector(15 DOWNTO 0) declares a 16-bit address bus. Individual bits are accessed as addr(15) (MSB) through addr(0) (LSB). Slices are written addr(7 DOWNTO 0) for the lower byte.
SIGNAL declares a named internal wire or register. A signal assignment (sig <= expr;) inside an ARCHITECTURE describes a permanent connection. When the assignment is inside a PROCESS, it describes a register (the value is latched on the triggering clock edge). When it is outside a PROCESS (in the concurrent region), it describes purely combinational logic.

PROCESS and Sensitivity Lists
A PROCESS block describes logic that reacts to changes on a set of signals — its sensitivity list. Whenever any signal in the sensitivity list changes, the process body executes (conceptually). In synthesis, the sensitivity list determines what the logic is clocked or gated by.
A synchronous (clocked) process uses rising_edge(clk) to describe a D flip-flop — the canonical building block of all sequential digital logic:
PROCESS (clk, reset)
BEGIN
    IF reset = '1' THEN
        -- Asynchronous reset: q goes to '0' immediately when reset is asserted,
        -- regardless of the clock.
        q <= '0';
    ELSIF rising_edge(clk) THEN
        -- Synchronous capture: q takes the value of d on each rising clock edge.
        q <= d;
    END IF;
END PROCESS;
All sequential logic in the Video Module follows this pattern. Combinational logic (address decode, pixel selection, colour lookup) is described with concurrent signal assignments or with processes that have no clock edge test.

COMPONENT Instantiation and PACKAGE
A COMPONENT is a sub-design used inside a larger design — analogous to calling a library function. The COMPONENT declaration specifies the interface; the instantiation wires it up to local signals. The Video Module uses this mechanism extensively: the FPGA top-level (VideoController_Toplevel.vhd) instantiates the main controller, PLL IP cores, and all peripheral blocks as components.
A PACKAGE is a shared collection of type definitions, constants, and component declarations. Both the CPLD and FPGA designs have their own package files (VideoInterface_pkg.vhd and VideoController_pkg.vhd) that are imported by every other file in the design using USE work.VideoController_pkg.ALL;.

Source Tree

Path Contents
CPLD/ CPLD VHDL source files
CPLD/VideoInterface.vhd CPLD main logic — voltage translation, clock division, bus control
CPLD/VideoInterface_Toplevel.vhd CPLD top-level entity with I/O pin declarations
CPLD/VideoInterface_pkg.vhd CPLD package — types, constants, component declarations
CPLD/build/ Quartus Prime 13.0.1 project for the CPLD
CPLD/build/VideoInterface.qpf Quartus project file
CPLD/build/VideoInterface.qsf Quartus settings file — device selection, pin assignments, compile options
CPLD/build/VideoInterface.csv Pin assignment documentation (2,085 bytes)
CPLD/build/VideoInterface_constraints.sdc Timing constraints (14,618 bytes)
CPLD/build/output_files/ Compiled bitstreams (.jic, .sof, .pof)
FPGA/ FPGA VHDL source files
FPGA/VideoController.vhd FPGA main controller — all register access, video timing, display pipelines, GPU
FPGA/VideoController_Toplevel.vhd FPGA top-level entity — 144 I/O pin declarations and PLL instantiation
FPGA/VideoController_pkg.vhd FPGA package — register addresses, mode codes, colour encoding, timing records
FPGA/functions.vhd Combinational helper functions (priority encoder, etc.)
FPGA/devices/ps2/ PS/2 keyboard interface IP block
FPGA/devices/BRAM/ Block RAM wrapper IP blocks
FPGA/devices/uart/ UART IP block
FPGA/devices/spi/ SPI controller IP block
FPGA/devices/timer/ Timer module IP block
FPGA/devices/intr/ Interrupt controller IP block
FPGA/devices/RAM/ RAM modules
FPGA/devices/SDRAM/ SDRAM controller IP block
FPGA/devices/SDMMC/ SD card (SDMMC) controller IP block
FPGA/devices/ioctl/ I/O control IP block
FPGA/build/ Quartus Prime 13.1 project for the FPGA
FPGA/build/VideoController.qpf Quartus project file
FPGA/build/VideoController.qsf Quartus settings file
FPGA/build/VideoController.csv Pin assignment documentation
FPGA/build/VideoController_constraints.sdc Timing constraints
FPGA/build/output_files/ Compiled bitstreams (.sof, .jic)
FPGA/build/core/ Quartus-generated PLL IP cores (Video_Clock_*.vhd)
FPGA/build/simulation/ Simulation testbenches
schematics/v1.0/ KiCad schematics and PDFs for the v1.0 discrete hardware design
schematics/v1.1/ KiCad schematics and PDFs for the v1.1 discrete hardware design
schematics/v2.0/ KiCad schematics and PDFs for the v2.0 FPGA/CPLD design
software/tools/make_cgrom.sh CG-ROM image builder script
software/roms/ Source ROM images (mz-80acg.rom, MZ80K_cgrom.rom, etc.)
software/mif/ Memory Initialization Files for FPGA BRAM

CPLD Design: VideoInterface.vhd

Role of the CPLD
The CPLD is an Altera MAX 7000A device (EPM7128S, 128 macrocells, 84-pin PLCC). It occupies the electrical boundary between the 5V MZ-80A system bus and the 3.3V FPGA. Direct connection of a 5V signal to a 3.3V FPGA input would damage the device over time and cause logic level errors; the CPLD's 5V-tolerant I/O buffers absorb the MZ-80A bus signals safely.
The CPLD performs three primary functions:
  • Voltage level translation: All signals arriving from the MZ-80A bus (address bus A0–A15, data bus D0–D7, control signals MREQ, IORQ, RD, WR, BUSACK) are received by the CPLD's 5V-tolerant inputs and re-driven at 3.3V for the FPGA.
  • Clock generation: The MZ-80A master oscillator runs at 17.7341 MHz (the MZ-80B pixel clock derived from the gate array). The CPLD divides this clock to generate the 8.867 MHz pixel clock required for 40-column mode. The 80-column clock is supplied directly from the master clock.
  • Bus control: The CPLD decodes the Z80 address and control signals to generate chip-enable, read, write, and address-select signals for the FPGA, translating the Z80's asynchronous bus protocol into synchronous signals that the FPGA can sample reliably.
The CPLD is intentionally kept simple. All complex display logic lives in the FPGA. If the CPLD is ever replaced or updated, the change is purely to bus interface timing and I/O translation — the display behaviour is unaffected.

Clock Divider
The clock divider in VideoInterface.vhd is a simple toggle flip-flop. Because the master clock has a 50% duty cycle, dividing by two with a toggle produces an exact 50% duty cycle output — important for correct pixel timing. The process is:
-- Divide the 17.7341 MHz MZ-80A pixel clock by 2 to produce
-- the 8.867 MHz 40-column pixel clock.
PROCESS (MZ_CLK)
BEGIN
    IF rising_edge(MZ_CLK) THEN
        CLK_DIV <= NOT CLK_DIV;
    END IF;
END PROCESS;

-- Drive the pixel clock output: 40-col uses divided clock, 80-col uses master.
PIXEL_CLK <= CLK_DIV WHEN MODE_80COL = '0' ELSE MZ_CLK;
The MODE_80COL signal is asserted by the FPGA via a feedback pin when the active display mode requires 80 columns. The CPLD routes the appropriate clock to the FPGA's dedicated clock input.

Bus Interface and Address Decode
The Z80 bus uses separate MREQ (memory request) and IORQ (I/O request) strobes, each active-low. The CPLD decodes these together with the address bus to generate select signals for the FPGA's internal memory-mapped regions:
Region Address Range Signal Usage
VRAM 0xD000–0xD7FF CS_VRAM Character video RAM (2 KB)
ARAM 0xD800–0xDFFF CS_ARAM Attribute RAM (2 KB)
I/O registers 0xD0–0xFF (IORQ) CS_IO Video controller registers
The address decode is combinational (no clock) to minimise setup time. The generated chip-select signals are synchronised to the FPGA clock inside the FPGA, not in the CPLD, to avoid introducing extra latency in the CPLD path.
The data bus direction is controlled by the CPLD's output-enable logic. When the Z80 performs a read (RD asserted, MREQ or IORQ asserted), the FPGA drives the data bus through the CPLD's bidirectional buffers. When the Z80 writes, the buffers are driven in the opposite direction. The CPLD ensures the direction changes occur outside the bus valid window, preventing bus contention.

Building the CPLD Bitstream
The CPLD requires Quartus Prime 13.0.1 (the last version with full MAX 7000 support). The build process from the GUI:
  1. Open CPLD/build/VideoInterface.qpf in Quartus Prime 13.0.1.
  2. Verify the target device is set to EPM7128S (or the appropriate MAX 7000 variant for your board). Check under Assignments → Device.
  3. Start compilation: Processing → Start Compilation. The MAX 7000 fitter runs quickly — typically under 30 seconds.
  4. Examine the Compilation Report. Check the Fitter section to ensure all timing constraints in VideoInterface_constraints.sdc have been met (no timing violations).
  5. The output files are written to CPLD/build/output_files/:
    • VideoInterface.pof — for ByteBlaster programming
    • VideoInterface.jic — for USB-Blaster JTAG programming
  6. To programme: Tools → Programmer. Add the .pof or .jic file, select the USB-Blaster interface, and click Start.

FPGA Design: VideoController.vhd

Overall Architecture
The FPGA is an Altera Cyclone III EP3C25E144C8 (25,000 logic elements, 144-pin LQFP). It implements the complete video controller — character display, graphics display, GPU, register interface, PS/2 keyboard, SD card, UART, and SDRAM controller — in a single device.
The design is split across four source files that correspond directly to layers of abstraction:
  • VideoController_Toplevel.vhd: Top-level entity. Declares all 144 I/O pins with their direction and IOSTANDARD attributes. Instantiates the PLL IP cores and the main VideoController entity as a component, wiring the physical pin signals to the internal design signals.
  • VideoController_pkg.vhd: The shared package. Contains all constant definitions (register addresses, mode codes, colour encoding), type definitions (video timing parameter records, mode tables), and component declarations for all peripheral IP blocks. Every other file in the FPGA design imports this package with USE work.VideoController_pkg.ALL;.
  • VideoController.vhd: The main architecture. All register access processes, the video timing generator, VRAM/GRAM multiplexing, the character and graphics display pipelines, GPU state machine, and interrupt logic are described here.
  • functions.vhd: A package of combinational helper functions used in the main controller — primarily a priority encoder for interrupt arbitration and bit-manipulation utilities used in the display pipeline.
The peripheral IP blocks in FPGA/devices/ are standard Altera-compatible VHDL modules. They interact with the main controller through internal bus signals defined in VideoController_pkg.vhd.

PLL and Clock Domains
The FPGA uses Quartus-generated PLL (Phase-Locked Loop) IP cores, found in FPGA/build/core/, to synthesise the pixel clocks needed for each supported video mode. The PLL IP cores are generated by the MegaWizard Plug-In Manager in Quartus and output as VHDL files (Video_Clock_*.vhd). They take the incoming 17.7 MHz reference clock from the CPLD and multiply/divide it to produce:
Video Mode Pixel Clock Source
Native MZ-80A 40-column 8.867 MHz Divided by CPLD, routed to FPGA
Native MZ-80A 80-column 17.734 MHz Master clock from CPLD
VGA 640×480 25.175 MHz PLL output
VGA 800×600 40.000 MHz PLL output
VGA 1024×768 65.000 MHz PLL output
The active pixel clock is selected by a clock multiplexer controlled by the current mode register value. Changing the video mode writes a new value to Control Register 0xF8, which the mode select logic uses to switch the pixel clock source. All synchronous logic downstream of the pixel clock must be held in reset during the clock switch to prevent metastability.
The FPGA operates in multiple clock domains: the Z80 bus interface domain (driven by the CPLD-supplied clock), the pixel clock domain, the SDRAM controller domain (often running at a different frequency for timing compliance), and the SD card SPI clock domain. All signals that cross clock domains are double-registered to reduce metastability probability. The double-register synchronisers are instantiated from the FPGA/devices/ library.

Register Access Process
The MZ-80A Z80 communicates with the video controller through I/O ports in the range 0xD0–0xFF. The register access process in VideoController.vhd is triggered whenever the CPLD asserts CS_IO together with the Z80's RD or WR strobe. It is a synchronous process clocked by the Z80 bus clock:
PROCESS (clk, reset)
BEGIN
    IF reset = '1' THEN
        -- Reset all control registers to their default (power-on) state.
        MODE_REG   <= (OTHERS => '0');
        COLOUR_REG <= (OTHERS => '0');
        GPU_CMD    <= (OTHERS => '0');
    ELSIF rising_edge(clk) THEN
        IF CS_IO = '1' AND WR_n = '0' THEN
            -- Z80 write to I/O port.
            CASE addr(7 DOWNTO 0) IS
                WHEN x"F8" => MODE_REG   <= data_in;   -- Machine model / display mode
                WHEN x"F9" => COLOUR_REG <= data_in;   -- Foreground/background colour
                WHEN x"FA" => GPU_CMD    <= data_in;   -- GPU command trigger
                -- ... additional registers
                WHEN OTHERS => NULL;
            END CASE;
        END IF;
        IF CS_IO = '1' AND RD_n = '0' THEN
            -- Z80 read from I/O port: place the requested register on data_out.
            CASE addr(7 DOWNTO 0) IS
                WHEN x"F8" => data_out <= MODE_REG;
                WHEN x"FB" => data_out <= GPU_STATUS;  -- GPU busy flag in bit 7
                WHEN OTHERS => data_out <= (OTHERS => '0');
            END CASE;
        END IF;
    END IF;
END PROCESS;
Register addresses are defined as constants in VideoController_pkg.vhd. When modifying the register map — for example, to add a new control register — add the constant to the package first, then add the corresponding CASE entry in this process.

Video Timing Generator
The video timing generator produces the horizontal sync (H_SYNC), vertical sync (V_SYNC), and active display enable (DE) signals that a monitor requires. It consists of two counters — a horizontal pixel counter and a vertical line counter — that increment on each pixel clock.
The timing parameters for each mode (H_TOTAL, H_DSP_START, H_DSP_END, H_SYNC_START, H_SYNC_END, V_TOTAL, V_DSP_START, V_DSP_END, V_SYNC_START, V_SYNC_END) are stored in a mode parameter RAM initialised at compile time and indexed by the active mode code. This avoids large CASE statements in the timing generator itself — the counters always behave the same way; only the comparison values change.
-- Horizontal counter: increments every pixel clock.
-- Wraps at H_TOTAL (total pixels per line, including blanking).
PROCESS (pixel_clk)
BEGIN
    IF rising_edge(pixel_clk) THEN
        IF h_count = H_TOTAL - 1 THEN
            h_count <= (OTHERS => '0');
            -- Increment vertical counter at end of each line.
            IF v_count = V_TOTAL - 1 THEN
                v_count <= (OTHERS => '0');
            ELSE
                v_count <= v_count + 1;
            END IF;
        ELSE
            h_count <= h_count + 1;
        END IF;
    END IF;
END PROCESS;

-- Active display enable: high only within the visible window.
DE <= '1' WHEN (h_count >= H_DSP_START AND h_count < H_DSP_END)
           AND (v_count >= V_DSP_START AND v_count < V_DSP_END)
     ELSE '0';

-- Sync pulses (active low for standard VGA).
H_SYNC <= '0' WHEN h_count >= H_SYNC_START AND h_count < H_SYNC_END ELSE '1';
V_SYNC <= '0' WHEN v_count >= V_SYNC_START AND v_count < V_SYNC_END ELSE '1';
The H_SYNC and V_SYNC polarity (active-high or active-low) is also stored in the mode parameters and applied as an XOR with '1' at the output stage. VGA modes use active-low sync; the native MZ-80A modes use active-low as well but with different timing relationships.

Character Display Pipeline
The character display pipeline converts VRAM character codes into pixel colours in real time, one pixel per clock cycle. The pipeline has four stages that must each complete within one pixel clock period:
Stage 1 — VRAM fetch: The horizontal counter and vertical counter are used to compute the VRAM address of the current character. For an 80-column display: vram_addr = (v_count / 8) * 80 + (h_count / 8). The character code at that address is read from BRAM (one clock latency).
Stage 2 — CGRAM fetch: The character code from VRAM, combined with the three lower bits of the vertical counter (the row-within-character), forms the CGRAM address: cg_addr = char_code * 8 + (v_count MOD 8). The 8 pixel bits for that row of the character glyph are read from CGRAM BRAM (one clock latency). Simultaneously, the attribute byte for the current character is read from ARAM.
Stage 3 — Colour decode: The attribute byte encodes the foreground and background colour in its lower bits (the exact encoding is defined by constants in VideoController_pkg.vhd). A lookup table maps the 3-bit or 4-bit colour code to the RGB output values.
Stage 4 — Pixel shift: The three lower bits of the horizontal counter select which of the 8 pixel bits is active at this clock cycle. If the selected bit is '1', the foreground colour is output; otherwise the background colour is output.
-- Stage 4: select the current pixel bit and choose foreground or background colour.
-- cg_data is the 8-bit pixel row fetched from CGRAM in Stage 2.
-- h_count(2 DOWNTO 0) is the pixel-within-character (0=leftmost, 7=rightmost).
pixel_bit <= cg_data(7 - to_integer(unsigned(h_count(2 DOWNTO 0))));

-- Drive the RGB output registers.
PROCESS (pixel_clk)
BEGIN
    IF rising_edge(pixel_clk) THEN
        IF DE = '1' THEN
            IF pixel_bit = '1' THEN
                R_OUT <= FG_RED;
                G_OUT <= FG_GREEN;
                B_OUT <= FG_BLUE;
            ELSE
                R_OUT <= BG_RED;
                G_OUT <= BG_GREEN;
                B_OUT <= BG_BLUE;
            END IF;
        ELSE
            -- Outside active display: drive black.
            R_OUT <= (OTHERS => '0');
            G_OUT <= (OTHERS => '0');
            B_OUT <= (OTHERS => '0');
        END IF;
    END IF;
END PROCESS;
The BRAM read latency means that the VRAM and CGRAM fetches must be initiated one or two cycles ahead of when the pixel is actually needed. The pipeline uses registered intermediate signals with carefully counted pipeline delays, matching the BRAM latency parameters defined in the BRAM IP block instantiation.

Graphics Display Pipeline
The graphics display pipeline operates in parallel with the character pipeline. The FPGA implements a three-plane frame buffer (Red, Green, Blue GRAM banks), each one bit deep per pixel, giving eight colours (black through white plus all primary and secondary colours). Each plane is stored in a separate BRAM.
The GRAM address is computed directly from the horizontal and vertical counters: gram_addr = v_count * (H_PIXELS / 8) + (h_count / 8). The byte fetched from each GRAM bank contains 8 horizontal pixels; the pixel within the byte is selected by h_count(2 DOWNTO 0), identical to the character pipeline.
The three one-bit pixel values (R, G, B from the three GRAM banks) are combined with the character pipeline output using a blend operator. The blend mode is set by bits in the Colour Register (0xF9) and can be:
Mode Effect
OR Graphics pixel OR character pixel — graphics can overlay characters
AND Graphics pixel AND character pixel — both must be set to show colour
NAND Inverted AND — character pixels are masked out where graphics are set
XOR Exclusive OR — graphics and characters alternate in overlapping regions
The blend operation is applied per colour plane. When the blended pixel is '0' for all three planes, the background colour from the attribute RAM is used instead, allowing character background colours to show through transparent graphics regions.

GPU State Machine
The GPU (Graphics Processing Unit) is a simple command processor that accelerates bulk operations on VRAM and GRAM — operations that would take hundreds of milliseconds if the Z80 performed them in software. It executes four commands:
Command Code Operation
Clear VRAM 0x01 Fill all of VRAM with the space character (0x20)
Fill VRAM rectangle 0x02 Fill a rectangular region of VRAM with a given character code
Clear GRAM 0x04 Fill all GRAM planes with 0x00 (black)
Fill GRAM rectangle 0x08 Fill a rectangular region of GRAM with a given colour
Reset / Idle 0xFF Abort current operation and return to idle immediately
The Z80 writes the command parameters (target address, width, height, fill value) to a 128-bit parameter FIFO via successive writes to the GPU parameter registers, then writes the command code to register 0xFA. This sets the GPU_CMD signal and triggers the state machine.
The state machine is a classic FSM (Finite State Machine) with states: IDLE, FETCH_PARAMS, EXECUTE, DONE. In the EXECUTE state it iterates over the target address range, writing one byte per clock cycle to the appropriate BRAM. Because the FPGA has direct internal access to the BRAM (not mediated by the Z80 bus), it can write at full pixel clock speed — many orders of magnitude faster than the Z80 could.
While the GPU is active (not in IDLE), bit 7 of register 0xFB (GPU_STATUS) reads as '1'. Software must poll this bit and wait for it to clear before issuing the next GPU command or accessing VRAM/GRAM directly. Accessing VRAM while the GPU is executing produces unpredictable results — the BRAM arbitration logic gives the GPU priority.
-- GPU FSM (simplified illustrative example).
PROCESS (pixel_clk, reset)
BEGIN
    IF reset = '1' THEN
        gpu_state  <= IDLE;
        GPU_STATUS <= (OTHERS => '0');
    ELSIF rising_edge(pixel_clk) THEN
        CASE gpu_state IS
            WHEN IDLE =>
                GPU_STATUS(7) <= '0';    -- Not busy.
                IF GPU_CMD /= x"00" AND GPU_CMD /= x"FF" THEN
                    gpu_state <= FETCH_PARAMS;
                END IF;
            WHEN FETCH_PARAMS =>
                -- Latch width, height, fill byte from the parameter FIFO.
                gpu_x     <= param_x;
                gpu_y     <= param_y;
                gpu_w     <= param_w;
                gpu_h     <= param_h;
                gpu_fill  <= param_fill;
                gpu_state <= EXECUTE;
                GPU_STATUS(7) <= '1';    -- Busy.
            WHEN EXECUTE =>
                -- Write one byte per clock; advance address.
                IF GPU_CMD = x"FF" THEN  -- Reset command: abort immediately.
                    gpu_state <= IDLE;
                ELSIF done_condition THEN
                    gpu_state <= DONE;
                ELSE
                    -- Perform write and increment counters.
                    bram_write_en   <= '1';
                    bram_write_addr <= current_addr;
                    bram_write_data <= gpu_fill;
                END IF;
            WHEN DONE =>
                GPU_STATUS(7) <= '0';
                GPU_CMD       <= (OTHERS => '0');
                gpu_state     <= IDLE;
        END CASE;
    END IF;
END PROCESS;

Building the FPGA Bitstream
The FPGA requires Quartus Prime 13.1 (Cyclone III support was removed in later versions). Full compilation takes 10–20 minutes depending on the host machine.
  1. Open FPGA/build/VideoController.qpf in Quartus Prime 13.1.
  2. Verify the target device is EP3C25E144C8 (Cyclone III, 25K LE, 144-pin LQFP). Check under Assignments → Device.
  3. Start compilation: Processing → Start Compilation. This runs Analysis & Synthesis, Fitting, Assembler, and TimeQuest Timing Analysis in sequence.
  4. When compilation completes, examine the TimeQuest Timing Analyzer report. All timing paths must have positive slack. Negative slack (a timing violation) means the design cannot reliably meet its timing constraints at the target clock frequency — this is a functional bug, not just a warning.
  5. Output files are in FPGA/build/output_files/:
    • VideoController.sof — JTAG programming (temporary, lost on power cycle)
    • VideoController.jic — Flash-stored programming (persists across power cycles)
  6. To programme via USB-Blaster: Tools → Programmer. For permanent installation, use the .jic file with the JTAG chain targeting the onboard serial Flash.
The resource utilisation after a typical successful compile is approximately 18,000–22,000 logic elements (70–88% utilisation), leaving limited headroom. Adding large new features may require logic optimisation or removal of existing features.

Docker Build Environment

Neither Quartus Prime 13.0.1 nor 13.1 installs cleanly on current Linux distributions (glibc version mismatches, missing 32-bit libraries). Docker containers with a fixed Ubuntu 14.04 or Ubuntu 16.04 base provide a reproducible build environment. The following examples assume a Docker image named quartus:13.0.1 and quartus:13.1 respectively with the corresponding Quartus installation.
CPLD Build (Quartus 13.0.1)
# Start the container with X11 forwarding for GUI access and USB-Blaster pass-through.
docker run --rm -it \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  -v /dvlp/Projects/MZ80A_80COLOUR:/project \
  --device /dev/bus/usb \
  quartus:13.0.1 bash

# Inside the container — command-line compilation (no GUI required):
cd /project/CPLD/build
quartus_sh --flow compile VideoInterface

# Programme the CPLD via JTAG (USB-Blaster on chain position 1):
quartus_pgm -m JTAG -o "P;output_files/VideoInterface.pof@1"
FPGA Build (Quartus 13.1)
# Start the Quartus 13.1 container:
docker run --rm -it \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  -v /dvlp/Projects/MZ80A_80COLOUR:/project \
  --device /dev/bus/usb \
  quartus:13.1 bash

# Inside the container — full FPGA compile:
cd /project/FPGA/build
quartus_sh --flow compile VideoController

# Programme the FPGA with the temporary .sof (JTAG, not Flash):
quartus_pgm -m JTAG -o "P;output_files/VideoController.sof@1"

# Programme the onboard Flash with the permanent .jic:
quartus_pgm -m JTAG -o "pvi;output_files/VideoController.jic"
USB-Blaster udev Rule
On the host (not inside Docker), add this udev rule so that the USB-Blaster is accessible without root privileges:
# /etc/udev/rules.d/51-usb-blaster.rules
SUBSYSTEM=="usb", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6001", MODE="0666"
After adding the rule, reload with sudo udevadm control --reload-rules && sudo udevadm trigger, then disconnect and reconnect the USB-Blaster.
When running Docker with --device /dev/bus/usb the entire USB bus is passed through to the container. If the system has multiple USB buses, use --privileged instead, or identify the specific bus/device number for the USB-Blaster and pass only that device.

CG-ROM Management

The CG-ROM (Character Generator ROM) defines the pixel patterns for all 256 characters in the MZ-80A character set. The Video Module stores up to 16 CG-ROM images in a 32 KB Flash RAM (each image is 2 KB). The active CG-ROM is selected at runtime via a register write.
Building the CG-ROM Image
cd /dvlp/Projects/MZ80A_80COLOUR/software
./tools/make_cgrom.sh
# Output: roms/COLOURBOARD_CG.rom  (exactly 32,768 bytes)
The script concatenates the following ROM images in order, building the 16 × 2 KB slot table:
Slot File Description
0 mz-80acg.rom Standard MZ-80A character generator
1 MZ80K_cgrom.rom MZ-80K character generator
2 MZ80K2E_Jap_cgrom.rom MZ-80K2E Japanese character set
3 MZFONT.rom Alternative MZ font
4–5 MZ700_cgrom.rom MZ-700 character generator (4 KB = 2 slots)
6–7 MZ700_cgrom_jp.rom MZ-700 Japanese character generator (4 KB = 2 slots)
8–15 (fill) Unpopulated slots filled to pad to 32,768 bytes
Adding a New CG-ROM
  1. Place the 2 KB binary in software/roms/ with a descriptive filename.
  2. Edit software/tools/make_cgrom.sh. Find the concatenation command (a cat invocation that assembles the slot images in order) and insert the new filename at the desired slot position. Each file must be exactly 2 KB (2,048 bytes); verify with wc -c before inserting.
  3. If displacing an existing slot, adjust the slot numbers in the script comment documentation to reflect the new layout.
  4. Rebuild with ./tools/make_cgrom.sh and verify the output is exactly 32,768 bytes: wc -c roms/COLOURBOARD_CG.rom must return 32768. If the total differs, a source ROM is the wrong size — investigate before programming the Flash.
  5. Programme the 32 KB image to the CG-ROM Flash chip on the Video Module PCB.
Runtime CG-ROM Access (FPGA v2.0)
In the FPGA design, the active CG-ROM content is also accessible via CGRAM — an internal BRAM copy that the character pipeline reads at full clock speed. The Z80 can overwrite the CGRAM content at runtime by paging it into the CPU address space.
Setting bit 7 of the Memory Page Register (I/O port 0xFD) maps the CGRAM to the CPU address range 0xD000–0xDFFF, temporarily overlaying VRAM. The Z80 can then write arbitrary 8×8 pixel patterns directly to this range, defining custom characters. Clearing bit 7 returns 0xD000–0xDFFF to VRAM. This allows software-defined character sets without any hardware modification.
To load one of the Flash-stored CG-ROM images into CGRAM at runtime, write the desired slot number (0–15) to the CG-ROM Select Register before performing the page-in operation.
Converting CG-ROM to MIF Format
Quartus BRAM initialisation uses MIF (Memory Initialization File) format — one hex word per line with a header. Use the existing files in software/mif/ as templates. The header specifies depth (number of words) and width (bits per word):
-- MIF file header for a 2048-byte CG-ROM (2048 words × 8 bits):
DEPTH = 2048;
WIDTH = 8;
ADDRESS_RADIX = HEX;
DATA_RADIX = HEX;
CONTENT BEGIN
  000 : 00;   -- Row 0 of character 0x00 (null)
  001 : 00;   -- Row 1
  ...
  7FF : 00;   -- Last byte
END;
The MIF files are used only for FPGA simulation (the testbenches in FPGA/build/simulation/). The actual hardware loads CG-ROM content from the external Flash RAM at power-on.

Adding a New Video Mode

The Video Module supports multiple display modes selectable at runtime via Control Register 0xF8. Adding a new mode requires consistent changes across the package, the main controller, and potentially the PLL IP core. Follow these steps in order:
Step 1 — Define the Mode Constant and Timing Record
Open FPGA/VideoController_pkg.vhd. In the section containing mode code constants, add a new constant:
-- Add to VideoController_pkg.vhd:
CONSTANT MODE_1280x1024 : std_logic_vector(3 DOWNTO 0) := x"8";

-- Add a timing parameter record for the new mode.
-- All values are in pixel clock cycles.
CONSTANT TIMING_1280x1024 : t_video_timing := (
    H_TOTAL      => 1688,   -- Total pixels per line (active + blanking)
    H_DSP_START  => 0,      -- First active pixel column
    H_DSP_END    => 1280,   -- Last active pixel column + 1
    H_SYNC_START => 1328,   -- H_SYNC pulse start
    H_SYNC_END   => 1440,   -- H_SYNC pulse end
    V_TOTAL      => 1066,   -- Total lines per frame
    V_DSP_START  => 0,      -- First active line
    V_DSP_END    => 1024,   -- Last active line + 1
    V_SYNC_START => 1025,   -- V_SYNC pulse start
    V_SYNC_END   => 1028,   -- V_SYNC pulse end
    PIXEL_CLK    => CLK_108MHZ  -- Required pixel clock selector
);
Step 2 — Add the Mode to the Parameter RAM
In VideoController.vhd, find the mode parameter RAM initialisation array (a VHDL aggregate literal that initialises a ROM-style lookup table). Add an entry for the new mode code:
-- In the mode parameter RAM initialisation, index by mode code:
CONSTANT mode_params : t_mode_param_array := (
    -- Existing modes ...
    MODE_640x480  => pack_timing(TIMING_640x480),
    MODE_800x600  => pack_timing(TIMING_800x600),
    MODE_1024x768 => pack_timing(TIMING_1024x768),
    -- New mode:
    MODE_1280x1024 => pack_timing(TIMING_1280x1024),
    OTHERS        => pack_timing(TIMING_640x480)   -- Safe default
);
Step 3 — Verify or Regenerate the PLL IP Core
1280×1024 at 60 Hz requires a 108 MHz pixel clock. Open FPGA/build/core/ and check whether a suitable PLL output already exists. If not, use the Quartus MegaWizard (Tools → MegaWizard Plug-In Manager) to regenerate the PLL IP, adding 108 MHz as an additional output clock. The generated VHDL files in core/ will be updated automatically. Verify that the PLL can achieve the required frequency with the 17.7 MHz reference clock — the MegaWizard will report whether the requested output is achievable.
Step 4 — Update the Machine Model Decode
If the new mode corresponds to a new machine model (e.g. emulating a different Sharp MZ variant), update the machine model decode logic in the register access process (Control Register 0xF8, bits that select the machine model). If it is an additional display resolution for an existing machine, no change to the machine model decode is needed — only the video timing parameters matter.
Step 5 — Rebuild and Test
Run a full FPGA compile. After programming, write the new mode code (0x8 in the example above) to register 0xF8 and verify that the monitor detects a valid sync signal and displays a stable picture. Use an oscilloscope on the H_SYNC and V_SYNC pins to verify timing against the VESA specification for the target resolution if a monitor refuses to sync.

Debugging Tips

Timing violations in Quartus: A negative-slack path in the TimeQuest report is a real functional bug — not just a warning. The failing path must be fixed before the design is considered correct. Common fixes: reduce logic depth between registers (break the path with an extra pipeline stage), tighten the SDC constraint to force the fitter to focus on that path, or restructure the VHDL to allow better register placement.
Simulation before hardware: The FPGA/build/simulation/ directory contains testbenches for key subsystems. Use ModelSim (included with Quartus) to simulate the register access process or the video timing generator before committing to a full compile. Simulation catches logic errors in minutes rather than requiring a 15-minute compile-programme-test cycle.
SignalTap Logic Analyser: Quartus includes SignalTap, an in-system logic analyser that captures internal FPGA signals in real time. To use it: add a SignalTap instance (File → New → SignalTap II Logic Analyzer File), add the signals of interest (e.g. the GPU state machine state, BRAM write-enable, the h_count and v_count registers), set a trigger condition, recompile, and programme. SignalTap captures a sample buffer to the Quartus host over the JTAG cable. This is invaluable for diagnosing display timing errors and GPU sequencing bugs without needing an external logic analyser.
Resource overflow: If the Fitter reports that the design exceeds the device's logic element count, the most effective reduction strategies are: reduce BRAM usage by combining small memories, reduce the number of registered pipeline stages in the display pipeline (at the cost of some timing margin), and check that the PLL IP core is not consuming excessive logic (it should use dedicated PLL resources, not general logic — verify this in the Chip Planner).
CPLD pin assignment errors: If the CPLD is programmed but the FPGA receives no valid bus signals, verify the pin assignments in CPLD/build/VideoInterface.csv against the board schematic (in schematics/v2.0/). A single transposed pin assignment in the .qsf file will cause silent failures that are difficult to trace without an oscilloscope. The .csv file is the authoritative documentation of intended pin assignments; always update it when modifying the .qsf.
Uninitialised signals in simulation: If ModelSim shows 'U' (uninitialised) propagating through the design on the first few simulation cycles, this indicates that a register's reset condition is not being applied. Check that every signal driven by a clocked process has an entry in the reset branch of its IF/ELSIF structure.
GPU BUSY flag polling: If the display shows corruption after GPU commands, verify that software is correctly polling bit 7 of register 0xFB and waiting for it to clear before accessing VRAM or issuing the next GPU command. A common mistake is polling the wrong register address or testing the wrong bit. The GPU BUSY flag is also asserted during the FETCH_PARAMS state — ensure software waits from the moment of command issue, not from a fixed delay.

Reference Sites

Resource Link
Video Module project page /sharpmz-upgrades-videomodule/
Video Module User Manual /sharpmz-upgrades-videomodule-usermanual/
Video Module Technical Guide /sharpmz-upgrades-videomodule-technicalguide/
Video Module Gallery /sharpmz-upgrades-videomodule-gallery/
RFS Developer’s Guide /sharpmz-upgrades-rfs-developersguide/
Altera Cyclone III Device Handbook Intel/Altera — logic element architecture, BRAM, PLL reference
Altera MAX 7000 Programmable Logic Device Family Intel/Altera — EPM7128S macrocell architecture, timing model
Quartus Prime 13.1 Handbook Intel/Altera — SDC constraints, TimeQuest, SignalTap, MegaWizard
IEEE Std 1076-2008 (VHDL LRM) IEEE — definitive VHDL language reference
VESA Monitor Timing Standard VESA — H/V timing parameters for standard VGA resolutions
Sharp MZ-80A Hardware Manual Sharp Corporation — bus timing, memory map, I/O port reference