Summers Pittman's Devsite

Syntax checking complete-ish

Progress update! Nachos—the Chip‑8 IDE I’m building—now has a feature‑complete tokenizer and syntax checker for the Octo language. It successfully parses every Octo example I’ve thrown at it except one. I’ll share that tiny exception later, but first, a quick tour of how the tokenizer and parser work, what they catch, and what I learned along the way.

Tokenization: turning text into tokens

Tokenization is the first step in compilation. The source text becomes a stream of tokens—objects that carry a type (Identifier, Number, Plus), position (line, column), and sometimes a value (identifier name, numeric literal, string contents).

For example, this Octo snippet:


Our first do-nothing function
: noop
return

becomes this list of token objects:

1Colon(line=0, column=0)
2Identifier(name="noop", line=0, column=2)
3Return(line=1, column=0)

The tokenizer:

Tokenization does not validate logic or structure. It doesn’t know if you forgot a parameter or mismatched parentheses—it just recognizes the pieces.

Tokenizer shape (pseudocode)

A tokenizer typically reads characters, decides what token kind could start at that position, consumes the rest, and emits a token:

 1// Pseudocode
 2while (canContinue()) {
 3    val ch = peekChar()
 4    when {
 5        ch == '#' -> consumeComment()
 6        ch.isWhitespace() -> consumeWhitespace()
 7        ch.isLetter() || ch == '_' || ch == ':' -> consumeIdentifierOrLabelOrDirective()
 8        ch.isDigit() -> consumeNumber() // e.g., 0x.., 0b.., decimal
 9        ch == '"' -> consumeStringOrError()
10        else -> consumeSymbolOrError() // operators, punctuation, or unknown
11    }
12}

Some errors can be detected here. For instance, Nachos emits an Error token if it sees an opening quote without a matching closing quote:

kotlin // Example outcome String(line=10, column=15, value="incomplete Error(line=10, column=15, message="Unterminated string literal")

Parsing: building structure and catching mistakes

Once we have tokens, the parser turns them into higher‑level constructs and performs syntax checks. Octo is a small assembly-like language:

This keeps the parse tree fairly linear and the parser straightforward. Unlike tokenization, parsing enforces rules and evaluates context:

The result is a list of ParseTokens. When something goes wrong, the parser emits Error nodes with clear, positional messages—but keeps going to surface as many issues as possible in one pass.

Example: macro expansion

Input:

:macro foo SIZE {
v0 := SIZE
i := CALLS # CALLS is the number of times this macro has been expanded.
}
foo 0x42
foo 0x42

Parsed summary:

1Macro, tokens size: 11, lines [1, 2, 3, 4]
2MacroExpand, tokens size: 2, lines [5]
3Assignment, tokens size: 3, lines [2, 5]
4IAssign, tokens size: 3, lines [3, 0]
5MacroExpand, tokens size: 2, lines [6]
6Assignment, tokens size: 3, lines [2, 6]
7IAssign, tokens size: 3, lines [3, 0]

Zooming into one expansion to show parameters and substitutions:

1MacroExpand, tokens size: 2, lines [5]
2Identifier(name=foo, line=5, column=12) Number "66" 5:16
3Assignment, tokens size: 3, lines [2, 5]
4Register(register=v0, line=2, column=16) Assignment(line=2, column=19) Number "66" 5:16
5IAssign, tokens size: 3, lines [3, 0]
6Register(register=i, line=3, column=16) Assignment(line=3, column=18) Number "1" 0:0

Example: catching errors (and continuing)

If we pass the wrong type and then forget an argument:

:macro foo SIZE {
    v0 := SIZE
    i := CALLS
}
foo "hello"
foo

The parser emits errors but continues analysis so the user gets multiple helpful messages in one run:

1MacroExpand, tokens size: 2, lines [5]
2Identifier(name=foo, line=5, column=0) String "hello" 5:4
3Error, tokens size: 4, lines [2, 5]
4Register(register=v0, line=2, column=4) Assignment(line=2, column=7) String "hello" 5:4 Error "Expected Register, Identifier, Number or Key" 5:4
5IAssign, tokens size: 3, lines [3, 0]
6Register(register=i, line=3, column=4) Assignment(line=3, column=6) Number "1" 0:0
7Error, tokens size: 3, lines [6]
8Identifier(name=foo, line=6, column=0) Error "Unexpected end of program" 6:0 Error "Error parsing macro foo" 6:0

The first macro expansion fails because a String can’t be assigned to a register. The second fails because the invocation is incomplete.

The one exception

One example file didn’t parse on the first try: caveexplorer.8o, line 222. A label happens to use the name exit, which is a SuperChip‑8 keyword. It’s a fun edge case—and also a nice sign that the project is far enough along to surface real, actionable issues from real programs. Real code is already producing real feedback.

Lessons learned

What’s next

Still plenty to do, but Nachos is now producing real value—and that’s delicious.

#Emulation #Kotlin #Compose #Programming