ZX Spectrum System Tour: Text Mode (bumbershootsoft.wordpress.com)

Now that we’ve taken a look at the kind of control that the Spectrum’s BASIC gives us over the hardware, it’s time to dip down into machine language and see what is offered to us there. There’s quite a bit more to cover in this realm, and I expect it’s going to end up spread out over quite a few posts.

It is also quite a bit more fraught than on many other platforms, because Sinclair only paid minimal attention to exposing a consistent and structured firmware interface. Commodore’s 8-bits, from the original PET through to the C128, all had backwards-compatible jump tables at the end of their “KERNAL” ROMs which meant that quite a lot of machine language code could function identically even when doing things like file processing and disk access. MSX was not a single computer at all, but an industry standard that manufacturers wrote to, and a major part of that was a well-defined jump table up near the interrupt vectors. The IBM PC’s BIOS served a similar role, though less voluntarily.

The Spectrum, however, has only a handful of defined entry points and as a result developers tended to directly call deep within the system ROM itself in order to get the results they wanted. The Timex Sinclair 2068 had an incompatible ROM and as a result almost no Spectrum software ran on it—it seems like this was understood to have been a major contributor to its failure in the US Market. I’m a bit skeptical of this, because that meteoric US crash was shared by the MSX, the Commodore 16, and the Coleco Adam, all of which were plausible direct competitors to the TS2068. However, given that the 128 was released after the 2068’s failure, and that the 128 kept the relevant parts of its ROM very carefully compatible with the 48K version, I suspect that Sinclair nevertheless learned an important lesson here.

For the purposes of this series I will be taking the 48K Spectrum as the target platform, but also making sure that the techniques I describe also work on the 16K and 128K machines. There were a vast number of Spectrum clones and successors, of varying fidelity; if I encounter something interesting regarding them along the way I’ll bring it up, but it will all be purely sidebar territory.

For our first foray, I’m going to look at the facilities we have to command to get a display like the 100%-BASIC “Manic Mechanic” game I showed off last week:

In this first post, we’ll start by looking at basic system organization: how machine language programs exist on the system, how they are loaded and run, and how they coexist with BASIC and the BIOS. Then we’ll dig into the facilities for text-based displays on the system, including the “user-defined graphics” facility that Manic Mechanic itself relies on. We’ll wrap up by looking at keyboard and joystick input, which turns out to be quite close to what we had to do in BASIC.

First Principles: Running Code at All

When I first started exploring the ZX Spectrum, I was very pleasantly surprised at how easily the BASIC could coexist with machine code. Many of the standard BASIC commands have extra modes that lend themselves very well not only to pure machine code programs but to mixed BASIC/ML combinations.

Files on cassette identify themselves as BASIC code, BASIC data, or binary data. BASIC’s SAVE and LOAD commands have extensions to specify what exactly is to be loaded, similar to the distinct LOAD and BLOAD commands in other BASICs. To load machine code from tape, we put the keyword CODE after the filename.
BASIC can restrict its memory usage by passing a parameter to the CLEAR command. By lowering the top of BASIC’s memory to just below your machine code’s loading point, memory conflicts should be entirely avoidable.
BASIC calls into machine code subroutines with the USR function. This takes the address of the routine as its argument; the routine does its work, then returns to BASIC with a RET instruction. The value of the BC register pair is returned to BASIC as the value of the function as a 16-bit unsigned integer. Routines with no meaningful return value generally dispose of it by passing it to the random number generator: RANDOMIZE USR in Sinclair BASIC accomplishes much the same thing as SYS in C64 BASIC.

In addition to saving out ranges of binary data, the SAVE function can also save a BASIC program such that it starts running immediately after LOADing—and it allows you start execution from any line number you wish. Back in 2017 I created a somewhat more sophisticated BASIC stub for my Spectrum programs that I’ve been using ever since. The general strategy was this:

The BASIC program is saved so that it begins execution on line 30 instead of 10.
Code at line 30 restricts BASIC’s memory with a CLEAR command and then loads the next file on the tape (our actual program) as binary. Once the load finishes it then jumps up to the top of the program.
Lines 10 and 20 call the loaded machine code with RANDOMIZE USR and then jump past the loader logic to cause the program to exit normally.

The end result of all of this a program that runs its machine code component when run, but which also, when loaded, carries out all necessary preparations exactly once. Turning this into something I could use as a turnkey packaging script is a bit out of scope for this tour, so I’ll just link to the original article about it and a copy of the script itself, if you want it.

Machine Code’s View of the Spectrum

The Z80 has a 64KB address space and on a 48K Spectrum all of it is used.

$0000–$3FFF is the system ROM.
$4000–$5AFF is the graphics RAM, for both the bitmap data and the color table.
Everything else is RAM for use by the system.

This is basically what it looks like on 16K and 128K Spectrums as well, though RAM on the 16K ends at $7FFF and both the ROM and RAM on the 128 are bankswitched.

When writing our assembly language code, we can’t use facilities that are owned by the system ROMs. That includes everything to do with interrupts, the shadow registers, and the IY register.

The IY register is special though, because while we are not allowed to edit it, we can and regularly will read it (it’s always $5C3A) and also index from it: it holds the base address of what Sinclair calls the system variables, a collection of data about the current system state that we will regularly find ourselves reading and writing. The locations of these variables are far more consistent across incompatible variants than the ROM; I leaned on this quite heavily when I made my first project.

The ROM itself turns out to be pretty casual about this, often referring to system variables directly by address, but then also indexing IY to access them as needed. Indexed operations are more expensive but more flexible, so a use case like this one where you know exactly what your base address is but also have an index register will cause you to swap back and forth between using absolute addresses and using indexed access. I’ll be writing my code here such that all system variable access goes through IY but this wasn’t generally done. (Indeed, I myself do not do it in the Spectrum platform guide.)

Printing Out Text

As we saw in our BASIC tour, the Spectrum divides the screen into a 22-line upper window for its main display and a 2-line lower window for input and status messages. This is not an artifact of BASIC; this division is present at the machine-code level as well. In order to print out text to the upper window, we need to configure the system variables appropriately to tell its output routines to direct text output there; we can do this by writing a 0 to TVFLAG at IY+2. Once this has been done, we may call the ROM routine at address $10 to print the character in A. Spectrum uses ASCII so normal text will be quite familiar to us.

That makes Hello World pretty straightforward:

        org $7000

        ld (iy+2),0              ; Write to upper screen
        ld hl,msg
lp:     ld a,(hl)
        inc hl
        or a
        ret z
        rst $10
        jr lp
msg:    db "Hello, world!",13,0

I usually use $7000 as my origin address because it’s deep enough into RAM to leave BASIC some breathing room while still having enough space left to probably let my smaller projects still fit on a 16K machine. I assemble this with a command like sjasm hello.asm hello.bin and package it with spectralink.py hello.bin 7000, then load up hello.bin.tzx into FUSE and…

We are off the ground!

Defining and Using Graphical Characters

When working with BASIC, we found that we had two kinds of graphical characters: built-in semigraphics and then a suite of reprogrammable characters it called “user-defined graphics.” We’d type them in by shifting into “graphics input mode” and then pressing keys 1-8 for the semigraphics (using inverse-video mode as necessary to get all 16 possible combinations) and A-U for the user-defined characters.

This is a place where things are slightly easier in machine language than BASIC. “Graphics Mode” is a quirk of the input system, and freed of the necessity of directly typing in the characters we want we can simply pass extended characters on to the character-printing routine at RST $10.

When using the predefined semigraphics, we get another nice bonus: the whole mess with inverse video was an awkward workaround we faced due to not having enough keys on our keyboard. Under the hood, we have all 16 possible semigraphics characters immediately available in the range $80–$8F. They are also laid out in a sensible order: bits 1, 2, 4, and 8 correspond to the upper-right, upper-left, lower-right, and lower-left quadrants of the character, respectively.

The user-defined graphics are similar; these occupy the range $90–$A4 and run from graphic-A through graphic-U. (The 128, with two fewer characters available, only offers up through $A2.) The rest of the character set through $FF is the abbreviated forms of BASIC keywords. (That’s why we lose a couple spots on the 128, incidentally; that’s where the SPECTRUM and PLAY commands show up.)

The precise location of the user-defined graphics varies from system to system, but there’s a 16-bit pointer to them named UDG at IY+65. This is precisely the value that is returned by the BASIC function USR "A", and since BASIC itself was then treating that as a buffer to POKE we get to do the same ourselves in machine code. Here’s a very simple loader routine, assuming there are num_chars worth of character data at gfx:

These actually work in BASIC, too; we can PRINT out all the graphics characters with CHR$ without having to dip into graphics or inverse mode.

Making Our Text Fancier

A lot of our time last week was dedicated to seven commands that we could offer on our own or even weave into PRINT and DRAW commands to control location, color, and behavior like blinking text. The character-printing routine handles most of this for us automatically, but the protocols are a bit different than in BASIC.

The basic insight is that we can recreate the inline changes to text attributes by printing out special control codes. Character codes $10 through $17 replicate the text control commands that we saw last week, representing, in order: INK, PAPER, FLASH, BRIGHT, INVERSE, OVER, AT, and TAB. (Well, OK. We didn’t go over TAB last week because it’s standard BASIC and isn’t really a Spectrum thing, but it turns out it’s implemented this way too.) All of these take a single byte afterwards as argument, except for AT, which takes two. These all work exactly the way they do in BASIC, including the special transparent/high contrast modes for colors 8 and 9 in PAPER and INK, and the AT arguments being in row, column order with (0,0) in the upper left corner.

What does not work exactly the way we saw it in BASIC is that they do not reset once our PRINT statement finishes. Not even printing a newline (which is a bare carriage return, $0D, on this system) restores the initial state. The system does have a distinction of sorts between “permanent” and “temporary” character attributes, but it’s much more tightly bound up with the internals of the ROM itself. To make proper use of those, we need to start looking more closely at things that Sinclair did not, at first, really expect us to be looking at.

Working More Closely With the System ROM

There are six bytes in the system variables that interact with text display:

ATTR-T at IY+85: the “temporary” text attributes. When we print out a character with the RST $10 instruction, the byte in this variable is copied into the corresponding cell in color memory to set its colors, its brightness, and its blinkiness.
MASK-T at IY+86: the “temporary” attribute mask. As it happens, the byte in ATTR-T isn’t copied exactly. Any bit set in this byte won’t be changed as part of the copy. This is how PAPER 8 and INK 8 work; they mask out the relevant parts of this variable and the printing routine leaves those parts of the color cell alone.
ATTR-P and MASK-P at IY+83 and IY+84: the “permanent” text attributes and mask. These get updated when commands like INK and PAPER are used as bare commands in BASIC, and the first thing a PRINT statement does is copy them over into the temporary slots. This is what makes the temporary color changes temporary. Screen-clearing instructions also consult ATTR-P to decide what color to clear the screen to.
P-FLAGS at IY+87: non-color text controls. The OVER and INVERSE modifiers aren’t part of the text attributes, and the logic required by the special INK 9 and PAPER 9 modes isn’t directly captured by either ATTR or MASK. Those values are stored as single bits here, with the temporary values stored in the even-numbered bits and the permanent ones in the odd-numbered bits. This interleaving allows the system ROM to copy permanent to temporary colors with some compact bit-manipulation operations.
BORDERCR at IY+14: the border color. Writing this doesn’t actually set the border color—that is controlled by writing the lower three bits of I/O port $FE—but the system ROM consults it when resetting the border itself and also, more importantly, when it is setting up the color attributes in the lower input window. The value in this byte is stored in bits 3-5; multiply your color value by 8 before storing it here.

Again, none of this was part of any formal BIOS interface or anything; use of the system ROM facilities kind of presumes that you have an annotated ROM disassembly to hand. Fortunately, we do: we’re using the hyperlinked version on skoolkid’s site. That lets us get our bearings a bit.

Nothing we access through the character-printing code ever touches the permanent data. However, there is an internal ROM routine that the published disassembly calls TEMPS at $0D4D that copies all permanent values into the temporary spaces, carrying out the attribute reset. This function takes no arguments and trashes A and HL, but otherwise can be called whenever you want.

The permanent attributes are mainly used by the screen-clearing function CLS, a ROM routine at $0D6B. These are what are used to fill the upper window’s attribute cells, with the lower window taking its values from BORDERCR and some logic that basically replicates what INK 9 does. It also resets the temporary colors to be the permanent ones. Annoyingly, despite checking BORDERCR here, it doesn’t actually set the border. Even more annoyingly, it resets our output location into the lower window.

We can deal with all this by relying on a helper function of our own and relying on that. This clrto function takes an attribute byte in A and clears the entire screen to that color; upper window, lower window, and border. It also clears out any non-color modes that might be on, and restores the output to the upper window on the way out. We can do that with TVFLAG, but as long as we’re directly invoking system ROM routines, we may as well do it “right” and actually open the upper window’s output channel. The CHAN-OPEN routine at $1601 selects an output device to send characters to, and device 2 is the upper window. The complete clrto function runs like so:

There’s one last quirk in the Spectrum’s text systems that we haven’t run into yet. If you print out enough text that the screen would end up scrolling, by default it won’t actually scroll down. Instead it will pause the output and wait for the user to confirm that they actually want the screen to scroll, or if they prefer to end the program. The system variable SCR-CT at IY+82 measures how many lines to print before prompting again; by regularly resetting it to some large positive number you can ensure a continuous scroll if that is what you want.

Reading Keypresses and Doing I/O

BASIC gave us two options for reading input: the INKEY$ function that checked to see if there was a keypress waiting to be processed, and then direct conversation with keyboards and joysticks using the IN function. Both have their parallels at the machine code level.

An interrupt fires every frame, which the firmware intercepts. Its interrupt handler updates a timer (FRAMES, a 3-byte value at IY+64) and scans the keyboard ports. If a keypress is found, it decodes it into a character, places that character in LAST-K at IY-50, and then sets bit 5 of FLAGS at IY+1. Here is a simple routine that waits for a key to be pressed and then returns its value in A:

We put a HALT in our loop here because only the interrupt handler actually updates these flags; there’s no need to spinlock on them when we can simply wait for the next scan.

Direct input is even easier, because as assembly language programmers we get direct access to the I/O bus with IN and OUT. However, there is one little quirk: unlike BASIC, our instruction set suggests that we only have 8 bits of I/O address, not 16. This turns out to be a design oversight by Zilog that turned into a feature that Sinclair relied on: the instructions IN A,(C) and OUT (C),A put the entire BC register on the address bus, not just C. To check if the A key is pressed, we can write something like this:

        ld bc,$fdfe                     ; IN 65022 in BASIC
        in a,(c)
        rra
        jr nc,pressedA

The Kempston joystick only uses an 8-bit port address, so that code ends up even more straightforward. Here’s a complete reader routine that returns delta-x in L, delta-y in H, and the fire button in C:

For full documentation on which ports correspond to which keys, and the various joystick layouts, Martin Korth’s ZX Docs have many nice tables. Joystick operations are basically identical to BASIC, so my various caveats from last week still apply, but with slightly less force. Emulators are still a little inconsistent about synthetic SHIFT keys, but we no longer have to worry about an accidental L BREAK INTO PROGRAM error when we move and shoot at the same time.

Putting It All Together

To wrap up all the things we’ve learned here, let’s knit them together into a program that produces the same banner display that we got out of BASIC last time:

The main program basically just calls each thing we’ve done here in turn, except for direct keyboard or joystick reads:

        org $7000

        ld a,$20                        ; Clear to black on green
        call clrto
        ld hl,gfx                       ; Load our 4 custom characters into $90-$93
        ld e,(iy+65)                    ; UDG
        ld d,(iy+66)
        ld bc,32
        ldir
        ld hl,msg                       ; Print the display
        call print
        call getkey                     ; Wait for keypress
        ld a,$38                        ; Clear back to black on white
        jp clrto                        ; and exit when we're done

The clrto and getkey functions are exactly as we listed them above. Our print function, however, needs to change a bit from the Hello World program. We were using a 0 byte as the string terminator up there, and that’s no longer a reasonable thing to do; we set both PAPER and BRIGHT to zero at various points in our display here and we need to do that by actually printing out actual CHR$ 0 characters. The solution isn’t difficult; we need merely replace our OR A instruction that implicitly compared A to zero with a CP $FF function that explicitly compares it with 255. That’s a safer code to use as a sentinel; it corresponds to the BASIC keyword COPY which we will never actually be printing out.

Then there’s the data itself: the message to print out and the graphics definitions for the umbrella. The message data makes heavy use of the text control codes above, and is broadly unreadable as a result. The graphics data is exactly the same as the DATA statements from the BASIC original, just back in hex this time the way they were in the TI BASIC original:

msg:    db $10,4,$11,0,$16,10,2,$8b,"                          ",$87
        db $13,1,$10,1,$16,11,2," ",$90,$91,$10,7,"  BUMBERSHOOT SOFTWARE   "
        db $10,1,$16,12,2," ",$92,$10,6,$13,0,$93,$13,1,$10,7,"Showing off the Spectrum "
        db $13,0,$10,4,$16,13,2,$8e,"                          ",$8d,$ff
gfx:    db $03,$0f,$1f,$3f,$7f,$7f,$ff,$ff
        db $c0,$f0,$f8,$f8,$f0,$e0,$c0,$80
        db $ff,$fe,$7c,$78,$30,$00,$00,$00
        db $80,$c0,$60,$30,$18,$08,$38,$00

This turns out to be a significantly smaller program than our BASIC original; even taking into account the BASIC stub and the extra tape header, this program is a bit under half the size.

Does Benny Get the Last Laugh?

The material we’ve covered today is enough that we could imagine a pure machine code version of the Manic Mechanic type-in game I used as an example last week. We’ve now seen enough to recreate the main game display and also manage input and animation. We’re still missing sound, though, and we can’t match the title screen yet:

Line graphics like what the DRAW command makes are inconvenient in machine language, but we’ve got some tricks up our sleeves yet that BASIC can’t match at all. Stay tuned.