Some days ago, I was adding a Z80 disassembler to my tzxtools. I could not find one for Python, so I decided to write my own. The result fits into a single Python source file. This article is the Making-of…
The Zilog Z80 is an 8 bit processor. This means that (almost) all instructions only consume 1 byte. For example, the instruction ret
(return from subroutine) has C9
as byte representation. Some commands are followed by another byte (as a constant to be used, or a relative jump displacement) or another two bytes (as a 16 bit constant or absolute address). Some examples:
C9 | -- | -- | -- | ret | Return from subroutine |
3E | 23 | -- | -- | ld a,$23 | Load constant $23 into A register |
C3 | 34 | 12 | -- | jp $1234 | Jump to address $1234 |
Note that for 16 bit constants, the bytes seem to be reversed in memory. This is because the Z80 is a so-called little endian CPU, where the lower byte comes first. Some other processor families (like the 68000 ) are big endian and store the higher word first.
So there are 256 instructions only, which makes it pretty easy to disassemble them. I used an array of 256 entries, where each entry contains the instruction of the respective byte as a string. For constants, I have used placeholders like "##
" or "$
". If such a placeholder is found in the instruction string after decoding, the appropriate number of bytes are fetched, and the placeholder is replaced by the value that was found.
If we were to write a disassembler for the 8080 CPU, we were done now. However, the Z80 has some extensions that need to be covered, namely two extended instruction sets and two index registers.
One set of extended instructions is selected by an $ED
prefix, and contains rarely used instructions. The other instruction set is selected by a $CB
prefix and has bit manipulation and some rotation instructions.
ED | B0 | -- | -- | ldir | Copy BC bytes from HL to DE |
ED | 4B | 78 | 56 | ld bc,($5678) | Loads value from address $5678 into BC register pair |
CB | C7 | -- | -- | set 0,a | Set bit 0 in A register |
For the $ED
prefix, I used a separate array for decoding the instructions. The $CB
instructions follow a simple bit scheme, so the instructions could be decoded by a few lines of Python code.
The Z80 provides two index registers, called IX
and IY
. They are used when the instruction is prefixed with a $DD
or $FD
byte, respectively. These prefixes basically use the selected index register instead of the HL
register pair for the current instruction. However, if the (HL)
addressing mode is used, an additional byte sized offset is provided. The index registers can be combined with the $CB
prefix, which can make things complicated.
E5 | -- | -- | -- | push hl | Push HL to stack |
DD | E5 | -- | -- | push ix | Push IX to stack (same opcode E5 , but now with DD prefix) |
FD | E5 | -- | -- | push iy | Push IY to stack (now with FD prefix) |
FD | 21 | 80 | FF | ld iy,$FF80 | Load $FF80 constant into IY register |
DD | 7E | 09 | -- | ld a,(ix+9) | Load value at address IX+9 to A register (offset is after opcode) |
CB | C6 | -- | -- | set 0,(hl) | Set bit 0 at address in HL |
FD | CB | 03 | C6 | set 0,(iy+3) | Set bit 0 at address IY+3 (offset is before opcode) |
When the disassembler detects a $DD
or $FD
prefix, it sets a respective ix
or iy
flag. Later, when the instruction is decoded, every occurance of HL
is replaced by either IX
or IY
. If (HL)
was found, another byte is fetched from the byte stream and used as index offset for (IX+dd)
or (IY+dd)
.
There is one exception. The examples above show that the index offset is always found at the third byte. This means that when the index register is combined with a $CB
prefix, the actual instruction is located after the index. This is a case that needed special treatment in my disassembler. If this combination is detected, then the index offset is fetched and stored before the instruction is decoded.
Phew, this was complicated. Now we’re able to disassemble the official instruction set of the Z80 CPU. But we’re not done yet. There are a number of undocumented instructions. The manufacturer Zilog never documented them, they are not quite useful, but they still work on almost any Z80 CPU and are actually in use.
Most of them are covered just by extending the instruction arrays. Additionally, the $DD
or $FD
prefixes do not only affect the HL
register pair, but also just the H
and L
registers, giving IXH
/IYH
and IXL
/IYL
registers. This is covered by the instruction post processing. A very special case is the $CB
prefix in combination with index registers, giving a whole bunch of new instructions that store the result of a bit operation in another register. This actually needed special treatment by a separate $CB
prefix instruction decoder.
Finally, the ZX Spectrum Next is going to bring some new instructions like multiplication or ZX Spectrum hardware related stuff. They were again covered by extending the instruction arrays. The only exceptions are the push [const]
instruction where the constant is stored as big endian, and the nextreg [reg],[val]
instruction that is (as the only instruction) followed by two constants.
And that’s it. 😄 This is how to write a Z80 disassembler in a single afternoon.