Painting myself into a corner while bootstrapping a language

I was hoping to make more progress on self-hosting my scripting language (Kalos) this week, but I’m running out of steam because I think I coded myself into a corner. This is not a post where I’ve got everything figured out, but instead I’m taking a few moments to re-hash where things are at and figure out a plan for the future.

My original plan for this language was to offer a Python-like experience with minimal resource requirements: it should be able to run on an AVR, on bare metal, or even as part of a DOS executable. I had originally planned to support compilation on those devices, and even built a zero-allocation parser. The runtime is lightweight, integers are a configurable size, and strings are even optional. I believe it’ll be a great option on lower-end devices where Python is just too heavy.

Over the last week I started work to extract a small piece of the parser that deals with KIDL (example of KIDLE below), the part of the language that glues it to your C code. This is currently part of the existing parser, and the only piece that I could think of to carve off on the slow march to true self-hosting.

idl {
    prefix "kalos_module_idl_";
    dispatch name;

    module builtin {
        fn println(s: string) = kalos_idl_compiler_println;
        fn print(s: string) = kalos_idl_compiler_print;
        fn log(s: string) = kalos_idl_compiler_log;
    }
...

Where I’m stuck now is that the language is almost good enough to parse itself, but I’m finding lots of corner cases and papercut bugs that make it less-than-ideal. For example, I’m finding that the parser as-is is not quite good enough to handle function calls that are deeply nested in expressions.

I also ended up hacking in some dynamic-dispatch objects that help with not having classes, but that’s not a long-term thing that I want to support in lieu of a proper object/class system.

Eventually I’ll have to commit to rewriting the whole parser in Kalos, but as the title of the post suggests, I’m stuck in a potential energy well where the next steps are going to be difficult. The current parser is written in C and – while the code is pretty clean – it’s a lot of work to make changes to it. It’s going to take some time and effort to add support for things like classes, tear-off functions, etc, and being allocation-free doesn’t make any of this easy.

My mistake was being too ambitious and going right for C as the bootstrap, rather than something higher level. I should have started with Python as the bootstrap!

So I need to gather enough energy to choose and work on one of the following paths:

Commit to rewriting the parser in Kalos, maybe after adding support for hashtables/dicts to the language. The language spec is still small enough that I could port the current parser. Debugging is very difficult, and you need to ensure that you’re running the code while developing it to discover if you’ve accidentally stepped on one of the many landmines. Once I have the parser/compiler in a higher-level language like Kalos itself, these landmines will be much easier to fix!
Rewrite the parser in Python, knowing that I’ll have to rewrite it in Kalos later. This might not be terrible because the language is supposed to be python-like and there might be a mechanical translation route available. The thought of writing more code to throw away doesn’t fill me with a lot of joy, but it might just be what I have to do.
Scrap the hand-rolled parser and switch to something like Lemon. We’re already using the amazing re2c to write the lexer, so adding another tool isn’t a bad idea. Again, we’re putting in a bunch of effort knowing that this will be tossed away later, but maybe there’s a middle ground like just having Lemon build an AST, then have Kalos script generate the bytecode?

Read full post

16 December 2022

grack.com

Blog

Writing

Code

Painting myself into a corner while bootstrapping a language

Related Posts