A Fundamental Exploration of Simple CPU Design

Jan 23, 2025

1. Introduction

Building a simple CPU from the ground up offers invaluable insights into the inner workings of modern computing. By stripping away advanced features like superscalar pipelines, complex branch predictors, and massive caches, a minimalistic CPU design lays bare the core processes of fetching, decoding, and executing instructions. This foundational approach helps students and enthusiasts alike understand the intricate interplay between the Arithmetic Logic Unit (ALU), registers, control logic, and memory.

Such simplified architectures typically feature a small instruction set, modest register files, and straightforward control structures, making them ideal teaching tools. Whether implemented on an FPGA, breadboarded with discrete logic, or simulated in software, these projects illuminate the essential principles of computer organization. As you explore the fundamentals—covering everything from the program counter and instruction register to load-store operations and basic branching—you gain a clear understanding of how a CPU orchestrates data movement and manipulations.

In the following sections, we will delve into the high-level architecture, walk through the instruction set, examine the crucial fetch-decode-execute cycle, and highlight practical considerations for designing, implementing, and extending a simple CPU. By the end, you will have a solid grasp of how a basic processor operates and be well-positioned to tackle more sophisticated CPU or system-on-chip (SoC) designs.

2. High-Level Architecture

2.1 The Data Path

A CPU’s data path is the collection of components through which data moves and is manipulated. A simple CPU design usually includes the following components:

Register File: A small set of registers for fast data storage. Typically 4, 8, or 16 registers of equal width (e.g., 8-bit, 16-bit, or 32-bit).
Arithmetic Logic Unit (ALU): Performs arithmetic (addition, subtraction, sometimes multiplication/division) and logic operations (AND, OR, NOT, XOR).
Program Counter (PC): Points to the current instruction in memory.
Instruction Register (IR): Holds the instruction currently being executed.
Memory Interface: Simple designs might use a single memory for both instructions and data or separate memories (Harvard vs. von Neumann architecture).
Multiplexers (MUXes): Used to select between multiple data sources, feeding data into registers or the ALU.
Control Logic: Provides signals to coordinate data movement and ALU operations.

2.2 The Control Unit

The Control Unit interprets the current instruction and sets the signals that drive the data path. In a simple CPU design, control can be implemented in a few ways:

Hardwired Control: A state machine or combinational logic interprets the instruction bits and sets control signals accordingly. This can be efficient but harder to modify.
Microcoded Control: A small ROM (microcode store) holds the control signals for each step of executing each instruction. Easier to extend with new instructions but slightly larger in implementation.

3. Instruction Set Architecture (ISA)

A “simple CPU” typically has a small, RISC-like instruction set. Example classes of instructions include:

Load/Store: Move data between registers and memory (e.g., LD R1, [address], ST R2, [address]).
Arithmetic: Perform integer operations (ADD R1, R2, R3, SUB R4, R5, R6).
Logic: Perform bitwise operations (AND R1, R2, R3, OR R4, R5, R6, XOR R7, R7, R7).
Branch/Jump: Change the program counter based on conditions (BEQ R1, R2, offset, JMP address).
Miscellaneous/Control: Could include NOP (no operation), HLT (halt), or special instructions for I/O.

Because memory addresses and registers must be encoded in the instruction, simple CPUs often choose short opcodes and limited register sets. For example, if there are 8 registers (R0–R7), you need 3 bits to address each register.

4. Fetch-Decode-Execute Cycle

Even the simplest CPU typically follows these broad steps:

Fetch: The instruction at the program counter (PC) is read from memory into the instruction register (IR). The PC is then incremented.
Decode: The control logic interprets the instruction bits (opcode, source registers, destination registers, immediate values, etc.).
Execute: The appropriate signals are sent to the ALU, registers, memory, etc., to perform the desired operation.

In a fully single-cycle CPU, these steps happen in one clock cycle (but usually at the cost of a longer clock period). In a multi-cycle CPU, each step might take a separate clock cycle, allowing a higher clock rate but requiring more complex control logic.

5. Data Path in Detail

5.1 Registers

A typical register file might support:

Two read ports: So two register values can be read simultaneously (for ALU operations).
One write port: To store the result back into the destination register.
A write enable signal: Ensures data is only written when required.

5.2 ALU

The ALU operations in a minimalistic CPU usually include:

Add, Subtract: Basic arithmetic.
Bitwise AND, OR, XOR: Core logical operations.
Shift operations: Left or right shifts, if included, can be done in the ALU or a separate shifter block.

Some designs may include a carry-out or overflow flag for addition/subtraction, though simpler versions might omit this.

5.3 Program Counter (PC)

The PC is usually implemented as a register that:

Increments each time an instruction is fetched.
Can be modified by a jump or branch instruction.

In many simple CPU designs, the PC increment logic is straightforward: PC <- PC + 1. For branching or jumping, the control logic overrides the increment and loads a new value into the PC.

5.4 Instruction Register (IR)

When the CPU fetches an instruction, it is stored in the IR. In a multi-cycle approach:

During the Decode cycle, control logic reads the IR’s contents to determine which signals to assert in subsequent cycles.

5.5 Memory Interface

In smaller hobby or educational CPU designs, you often see a single memory (for both instructions and data) or two separate memories. The address to memory typically comes from:

PC for instruction fetching.
A register (or the ALU output) for data loading and storing.

During the fetch cycle, memory address lines connect the PC to the memory. During a load/store cycle, memory address lines come from the ALU output or a designated register.

6. Control Logic Design

The control logic is what orchestrates the CPU’s behavior.

6.1 Hardwired Control

In a hardwired control implementation:

A finite state machine determines the step of the current instruction.
Instruction opcode bits feed into combinational logic to decide the operations at each clock.
The outputs are control signals to the register file, ALU, memory, and other components.

For a simple CPU, this might be done using a small state machine with states like Fetch, Decode, Execute, Memory Access, Write-Back. Each state’s behavior depends on the instruction type.

6.2 Microcoded Control

In a microcoded design, each opcode corresponds to a sequence of micro-instructions stored in a microcode ROM. Each micro-instruction indicates signals such as:

Register source for ALU input A and B
ALU operation (ADD, AND, OR, etc.)
Register destination for the ALU result
Memory read/write enable

This approach simplifies modifications, as introducing a new instruction can be done by adding microcode instead of reorganizing complex combinational logic. However, it requires additional memory for the microcode store.

7. Example Instruction Execution

To illustrate how the CPU operates, consider an ADD R1, R2, R3 instruction in a hardwired, multi-cycle CPU:

Fetch:
- The control logic sets memory address to PC, reads the instruction into IR, increments PC.
Decode:
- The control logic reads IR’s opcode bits, sees ADD instruction, and identifies registers R2 and R3 as sources, R1 as the destination.
Execute:
- ALU inputs are read from register R2 and R3.
- ALU operation is set to ADD.
- ALU result is produced.
Write Back:
- Control logic enables the register write signal for R1.
- The ALU output is written to R1.

For a single-cycle CPU, these steps happen in one clock cycle, but the hardware needs to be fast enough to handle all sub-steps within that period.

8. Pipelining (Optional in a Simple CPU)

Basic CPUs may not implement pipelining, where multiple instructions overlap in their fetch/decode/execute phases. In more advanced (but still educational) designs, you might see a 2-stage or 5-stage pipeline, which improves performance but adds complexity (e.g., pipeline hazards, forwarding logic, stall mechanisms).

For an introductory CPU design, staying with a single-cycle or multi-cycle non-pipelined approach is common to reduce complexity. That said, if the design from simplecpudesign.com includes a pipeline, it would be a limited form (like a 2-stage pipeline) to demonstrate the concept of parallel instruction execution.

9. Implementation Technologies

9.1 FPGA

Many hobby CPU designs are implemented on an FPGA (Field-Programmable Gate Array) board:

Allows quick experimentation and debugging.
Provides easy access to internal signals via test pins or on-chip debugging features.
Can adapt to changes in the CPU design simply by reprogramming the FPGA bitstream.

9.2 Discrete Logic

Some educational or retro projects use discrete 7400-series logic (or similar) chips. This is more labor-intensive and requires a significant breadboard or PCB real estate, but is a great way to learn digital electronics at a fundamental level.

9.3 ASIC/Custom Silicon

Producing an actual chip is beyond the scope of most educational projects, but advanced hobbyists may design the CPU in an HDL, simulate it thoroughly, and theoretically convert it to an ASIC. This is typically cost-prohibitive for most enthusiasts.

10. Practical Considerations

10.1 Clock Speed

A simple CPU might operate at a lower frequency (e.g., a few MHz or even hundreds of kHz in discrete form) due to the limitations of simpler designs and absence of advanced timing optimizations.

10.2 Debugging and Verification

Common debugging approaches:

Simulation: Before putting the design on hardware, run it in a digital logic simulator.
LEDs / 7-segment displays: Some minimalist designs drive LEDs from internal signals to observe what’s happening on the CPU in real-time.
UART / Serial Output: More advanced approaches send debug messages over serial to a PC.

10.3 Extensibility

Once a core CPU is running a small instruction set, it can be extended with:

Additional instructions (e.g., multiply, divide).
More registers or a different register arrangement.
Optional pipeline stages for performance.
Peripheral interfaces (GPIO, timers, serial, etc.).

11. Example of an 8-Bit Simple CPU Specification

To consolidate everything, here is an example specification for a hypothetical 8-bit simple CPU:

Word Size: 8 bits
Memory: 256 bytes (addresses from 0x00 to 0xFF)
Registers: 4 general-purpose registers (R0, R1, R2, R3)
Instruction Format:
- 8-bit instructions:
  - 4-bit opcode, 2-bit source register, 2-bit destination register (for register-to-register operations)
  - Alternatively, a 4-bit opcode, 4-bit immediate for immediate instructions
Supported Instructions (example set):
- ADD dest, src
- SUB dest, src
- AND dest, src
- OR dest, src
- XOR dest, src
- LD dest, [addr] (for load from memory)
- ST src, [addr] (for store to memory)
- JMP addr
- BEQ src1, src2, addr (branch if equal)
- HLT (halt)
Execution Model:
- Single-Cycle for each instruction in some minimal designs, or
- Multi-Cycle (Fetch -> Decode -> Execute -> Memory -> Write-Back)
Clock Frequency: e.g., 1 MHz to keep timing simple.

Conclusion

Designing a simple CPU from the ground up demystifies the core concepts underpinning every computer system. By exploring a straightforward instruction set, a minimal data path, and clear control logic, one gains an appreciation for how complex tasks can be constructed out of simpler building blocks. This foundational knowledge not only enhances one’s understanding of advanced architectures but also fosters the practical skills needed to design, test, and debug digital systems.

Whether you plan to implement your design on an FPGA or prototype it with discrete logic, the process will sharpen your insights into how memory, registers, the ALU, and the control unit all harmonize to fetch, decode, and execute instructions. Moreover, this understanding creates a gateway to more sophisticated topics like pipelining, caching, and out-of-order execution, should you choose to continue your exploration of CPU design and computer architecture.

References and Further Reading

Simple CPU Design: Offers thorough tutorials and examples focused on creating a minimal CPU from scratch.
Opencores.org: A repository of open-source hardware projects, including CPU cores you can study or adapt for your own needs.
FPGA4Fun: Provides projects and tutorials on FPGA development, which can be used to prototype and test custom CPU designs.
Hackaday.com: Features numerous community-driven hardware projects, including DIY CPU implementations and related digital logic explorations.
Harris, D. M., & Harris, S. L. (2012). Digital Design and Computer Architecture. 2nd ed. Morgan Kaufmann: A highly accessible textbook that covers everything from basic logic gates to CPU microarchitecture.

By continuing with these resources and experimenting on your own, you will deepen your expertise, foster your creativity, and build a strong foundation in computer architecture that can be applied to countless areas of technology.

André Machado | Blog