I meant to write this up a bit ago, but I wanted to write something short (since it’s still in progress) about an older project I’ve gotten back to, called Muftec.
At it’s core, it’s a Forth-ish stack based programming language, shamelessly copied from MUF (Multi-User Forth), specifically from Fuzzball 6. However, this specific implementation intentionally does not have any language constructs from the MUCK itself, like dbrefs.
I decided to undertake this project since I had never taken a compiler course, and I figured Forth is a rather easy place to start, given its stack-based nature. I was already familiar with writing MUF (though it’s been a long while) so I thought I would throw something together.
Muftec is written in C#, and relies on reflection during the compilation stage to register opcodes (external functions). Right now, the application is capable of taking a file (or just a string) and compiling it into an “execution stack”. It is parsed to tokens at compile time, and then at runtime individual values (like strings or integers) are placed on the “runtime stack”, to be used by opcodes, or if it detects an opcode, the associated C# function is called out of a registered class library. The system is designed to be able to be expanded so an implementing application could take user Forth and the application’s opcodes would be callable.
At the time of this post, a majority of the important features have been implemented. Most of the relevent MUF opcodes are there, functions work, and if statements work. Still missing are library includes (and other preprocessor abilities), arrays and dictionaries, and their related opcodes. Global variables are implemented but not tested, and local variables and variable scope have not yet been implemented.
The compiler is built as a state machine (inheriting from a common interface). There is a main “compiling” state, which knows how to read the outer Forth syntax. When it detects a Forth function, the state machine begins to read the inner function code, which processes elements such as variables, if statements, opcodes, integers/strings/etc., and the like. An if statement invokes another instance of the inner function code reader, which then creates a special execution stack element so the if statement can be evaluated at runtime by C# code. It also registers Forth functions (as a dictionary) and declared variables, so the runtime can keep track of them to be called. Functions and the execution stack are built as Queues, which are popped as they are executed. From what I can tell of traditional execution environments, they should probably lists with a program counter, but I don’t feel the need to implement them that way unless I need to be able to read past execution code.
Running the compiled code is rather simple (see the Run function in this file), it takes the execution queue (the compiled code), a runtime stack (usually empty at start) and the registered callable functions and variables. Then each element on the execution queue is evaluated, which either means adding a static value to the runtime stack, or doing something like calling an opcode. Opcodes are registered at the start of the environment, and found at that time by reflecting for a given OpCodeAttribute attached to a function. Opcodes are added by name to a dictionary, with a delegate assigned to call when hit in the execution queue.
Individual opcodes are rather simple. They receive an instance of the OpcodeData class, which contains persistant data that can be passed through between opcodes. Right now this only really only contains the Random seeds. This object however also contains the current runtime stack. Let’s take the simplest example as a demonstration on both the execution environment, and how the code handles it:
The Muftec code:
1 2 + print
During execution, this first adds 1 to the stack, then 2. Then the opcode + is looked up in the lookup dictionary, which calls the Add function currently contained in the Base Class Library provided by the project. The Add function pops the top item off the stack (2), then the second (1), then adds them together. It takes this new value (3) and puts it back on the stack. In our example, the print opcode is then called, which would read the 3 off the stack and print it to the console.
Functions are defined as:
: functionname
<content>
;
There must be at least one function in a script, as the last defined function is called automatically. Functions are given names, and because function registration happens before/during code parsing, a function must be defined before it can be used, much like in C. This can be avoided if we just check for a function name at runtime before assuming an unknown name is an opcode instead.
And that’s all there is to it. This writeup only covers the absolute basics behind the concepts of the language. Because it is intended to mirror MUF more directly I don’t really plan on writing more about the semantics of the language, but I may write more in the future about challenges I have or will run into working on more parts of the implementation.