Notes on a 1987 Sampler's Firmware

The Casio FZ-1's firmware is 64 KiB of V50 machine code, masked into two 32 KiB ROMs. It runs the keyboard, the LCD, the floppy drive, and an eight-voice sound engine scheduled on a single timer interrupt. I have been reverse engineering it. I had never tried it before this. It turns out I really enjoy it.

Reverse engineering involves taking compiled bytes back to something you can read and reason about. There is no source code, no comments, no symbol table; just instructions, addresses, and the side effects they cause.

I downloaded Ghidra, the disassembler the NSA open-sourced in 2019. You open a binary and it gives you an instruction listing, a decompiled C view that approximates the original, and a graph of which functions call which.

The first thing the disassembler told me was nonsense. The reason was that the FZ-1's ROM lives on two physical chips wired so one provides the low byte and the other the high byte of every 16-bit word the CPU fetches; neither half makes sense on its own. The two have to be zipped together first. These are the last 16 bytes of the merged image:

fz1s.bin (even bytes):  EA 06 F0 49 4F 55 49 ...
fz2s.bin  (odd bytes):  50 00 48 52 59 4B FF ...
interleaved:            EA 50 06 00 F0 48 49 52 4F 59 55 4B 49 ...

The first five bytes shown, EA 50 06 00 F0, decode as JMP FAR F000:0650: the reset vector. They sit at the very top of the address space, which is where 8086-family CPUs boot from. The next eight, 48 49 52 4F 59 55 4B 49, spell HIROYUKI. Apparently the firmware engineer's name. Nobody seems to have confirmed publicly who he was; the same signature was spotted on a Gearspace thread years ago and the trail goes cold there too. Someone else figured the interleave out on that thread; I rediscovered it independently and we both ended up staring at the same name.

The first stretch of work was hand-rolled. I built a small pipeline that fed the model pre-decoded instruction windows for one function at a time, with whatever annotation I already had for context. The model proposed an explanation. A separate gate then re-read the same bytes out of the ROM image and rejected any claim whose cited bytes did not match what was actually there. I ran this in batches, eight functions at a time, until every callable routine in the firmware had at least a one-line summary tied to specific instructions.

Then I switched to ghidra-mcp, which exposes Ghidra's API to a model directly. Instead of feeding it pre-decoded windows, the model could ask Ghidra to decompile a function on demand, list its callers, follow a cross-reference, write its annotations back into the Ghidra project rather than into a sidecar file. The C the model produced was closer to what the original probably looked like, and it was easier to spot where the model was confabulating versus where it was reading the bytes, because the bytes were one query away. Or more honestly: more than once the model wrote a plausible-sounding paragraph that the gate then rejected because the citations did not survive contact with the ROM.

What I keep coming back to about the firmware itself is how often one routine does the work of several. There is a routine called work_reset that the firmware uses for the cold boot and for every soft mode switch; one piece of code zeroes the sound engine and fills the voice table with free markers, called from seven different sites. Every LCD drawing primitive (pset, line, cls) takes a colour argument whose top bit, if set, suppresses the LCD write and composes into a 1-bit-per-pixel mirror in RAM instead. So pset(x, y, color) flushes to the display, and pset(x, y, color | 0x8000) accumulates into the mirror, which is then committed in one burst by a later call without the bit set. ORing one bit into a parameter turns immediate drawing into batched drawing. The voice allocator, when a MIDI note arrives, walks the eight voice slots up to three times: first for a free one, then for a slot already playing the same key on the same channel (replacement), then for the oldest live voice (stealing). And the data formats themselves are unified: LOAD and SAVE each go through one orchestrator that dispatches to three transports (floppy, the inter-unit data port, MIDI SysEx), with the same buffer composition and chunking. A voice saved to floppy and the same voice broadcast as MIDI SysEx are the same bytes in the same order. One engineer worked this out forty years ago on a machine with 64 KiB of code budget, and the taste is consistent from the cold boot to the LCD primitives to the voice scheduler to the way bytes leave the box.

The practical reason for going to all this trouble: I needed a sharper model of what the firmware actually expects from a disk image before I could trust fizzle's output beyond the cases I had already tested by hand, and I wanted to write small Type-5 programs (Casio's term for user-loadable binaries that run on the sampler) without each one being a fresh trial. Both of those are closer now. The disk-format work is paying off because the firmware-side reader is no longer a black box. More to come soon.

Thank you, HIROYUKI.