On a slow processor like the Z80, it is essential to think about execution time. Often a clean approach is too slow, and you need to optimize the code to make it a lot faster.
The ZX Spectrum screen bitmap is not linear. The 192 pixel rows are divided into three sections of 64 pixel rows. In each of these sections, all the 8 first pixel rows come first, followed by the second pixel rows, and so on. The advantage is that when writing characters to the bitmap, you only need to increment the
H register to reach the next bitmap row. The disadvantage is that a pixel precise address calculation is hell.
This is how the coordinates of a pixel are mapped to the address:
X0 represent the bit number at the address. It can be used as a counter for right shift operations.
My first attempt was a straightforward code that shifted, masked and moved the bit groups into the correct places. It took 117 cycles. This is nice, but we can do better.
We need a lot of rotation operations to shift the bits to the right position. Rotation is a rather expensive operation on a Z80, because there are no instructions that rotate by more than one bit at a time. My idea was to divide the X coordinate by 8 (by rotating it three times to the right) and simultaneously shift Y3 to Y5 into the
L register. With a similar trick, I could set bit 14 while rotating, which saved me another
or operation with a constant.
This is the final optimized code. It takes the X coordinate in the
C register, and the Y coordinate in the
B register. The screen address is returned in the
HL register pair.
DE are unchanged, so there is no need for expensive
pixelAddress:: ld a, b and %00000111 ld h, a ; h contains Y2-Y0 ld a, b rra scf ; set bit 14 rra rra ld l, a ; l contains Y5-Y3 and %01011000 or h ld h, a ; h is complete now ld a, c ; divide X by 8 rr l ; and rotate Y5-Y3 in rra rr l rra rr l rra ld l, a ; l is complete now ret
It only takes 108 cycles,
ret inclusive. Optimizing saved me 9 cycles (or about 8%). This doesn’t sound like much, but if the code is invoked in a loop, those 9 cycles are multiplied by the number of loop iterations.
I claim this is the fastest solution without resorting to a lookup table. Try to beat me! 😁