Compiling
An Empty Program
Let's start with something simple: an empty program. This will demonstrate the bare bones layout of the generated web assembly.
Every snax program is encapsulated into a webassembly module.
Web assembly modules consist of a series of declarations for various things that your program might need to run.
Let's look at each line in the output.
This declares a global variable, called $g0:#SP
. The SP
stands for "Stack Pointer". We'll learn more about how this stack pointer variable is used when we get to functions.
This line declares a linear memory buffer with id $0
, with an initial size of 100 pages and a maximum size of 100 pages. 1 page of memory is 64 KB, so by default snax programs have 6.4MB of
memory to work with.
The next two lines,
Make the stack pointer and the memory accessible to the host environment in which the web assembly is being executed. So for example, when running the compiled web assembly in a javascript environment, the javascript would be able to access the stack pointer, and the raw linear memory used by snax. This is useful for inspecting the state of a snax program at runtime.
A Simple Program
Ok, now let's look at a program that actually does something:
Things get a lot more complicated!
The _start function
WebAssembly itself is unopinionated about how web assembly programs are executed. Any function in web assembly can be exported to the host environment, allowing the host environment to call it whenever it wants. For libraries compiled to web assembly, the library will export a bunch of functions that can be called as needed. While snax can be used to make libraries, it can also be used like a script that just runs.
For the script scenario, there is a convention to export a function named _start which the host environment will call. This "convention" is actually part of the WASI (Web Assembly System Interface) standard. See here for documentation about _start.
This _start
function initializes the stack pointer to 6553600, which is the end of our linear memory space. When functions are called, the stack will grow from the end.
The main function
Functions in snax get compiled directly to functions in web assembly, with name mangling to handle
namespaces, which are not part of web assembly. The naming convention is <${snaxNamespace}${snaxFuncName}>f${funcOffset}
. Since these examples are all being compiled in your web browser, where there is
no concept of file paths (which would normally be part of the namespace), the namespace is <root>::
. Since this is the 0th function that's been declared, the funcOffset
is 0, resulting in a function name $<<root>::main>f0
.
At the beginning of every snax function, you'll see this code:
Here we are creating a "local" variable in web assembly where we store the current value of the stack pointer. This will come in handy when we want to access snax variables that are stored on the stack.
Local Variable Allocation
Snax supports definining local variables in your function. Local variables
come in two flavors: reg
and let
variables. reg
variables are limited
to data types that can fit into 64 bits (numbers, booleans), while let
variables can store larger values (arrays and structs for example).
reg
Variables
Let's look at how reg
variables get translated into web assembly. In
the below code, we have 5 reg
variables, two of which are inside blocks.
You'll notice that the function in web assembly gained 4 local declarations:
TODO: insert link to webassembly documentation regarding local declarations
These map exactly to the reg
declarations. Because blocks in snax create
new scopes, it's impossible to access d
and e
outside of their respective
blocks. The snax compiler keeps track of the lifetime of these reg
variables
and will map them onto the same (local)
in web assembly if their types
match and their lifetimes don't overlap. In this case, the lifetimes of
d
and e
do not overlap, and they are the same type, so we'll reuse local $4
for both.
You'll also see a bunch of calls to local.set
:
Every reg
declaration will initialize it's corresponding local to 0. This
prevents possible garbage results when the same local is used for multiple
different reg
declarations. So while d
and e
both use local $4
, we
can reset assured that e
won't accidentally take on the last value of d
.
let
Variables
let
variables allow storing values that can't fit in a web assembly local,
like arrays and structs. This is achieved by storing the values of let
variables in linear memory. Using linear memory also means that all let
variables have an address in memory, and can therefore be passed around
with pointers.
Let's look at a simple example where we declare a couple of arrays.
There are two important bits that got added. The first is some code to update the stack pointer:
Remember as mentioned earlier that the stack grows from the end of linear
memory. Before we do anything else in our function, we have to allocate
all the space on the stack we'll need for the function's let
variables
and other values we might need to store in linear memory. We do this by
decrementing the stack pointer. We decrement by 52 bytes because we have
the a
variable takes up 12 bytes and the b
variable takes up 40 bytes,
which totals to 52 bytes.
The second interesting bit of code is:
Just like when we initialized all of our reg
variables with
(local.set $1(i64.const 0))
, we also initialize all of our stack
variables. In this case we use the memory.fill
instruction to
bulk write 0s to the range of bytes that we've set aside for each
of our let
variables.
Function Calls And Function Arguments
Web assembly provides the abstraction for passing arguments to functions, so long as those arguments can fit into the limited numeric data types that web assembly supports (i32,i64,f32,f64). Let's look at a simple program with an add function that takes two arguments:
First let's look at the add
function. Here is the web assembly:
Note that our function declaration has been augmented with
(param $0 i32) (param $1 i32) (result i32)
. This specifies the
two parameter types and the return type of the function. The
parameters have ids $0
and $1
, and it's important to note that
these are in the same namespace as local
variables. That's why
the local we use to save the stack pointer has id $2
in this
function.
Next let's look at the function call:
Every function call is encapsulated inside a (block)
instruction because
there are several steps, read from inside out:
- call the
add()
function with our two parameters1
and2
: - store the function's return value in a temporary variable
- reset the stack pointer (remember that the stack pointer gets modified
at the beginning of every function definition). This effectively reclaims stack space that was used by the
add()
function: - do something with the returned result
It's important that the steps happen in this sequence, particularly resetting the stack pointer, because it's possible that the thing we do with the return value is immediately passed to another function!
Passing Large Values as Function Arguments
So what happens if we want to pass a larger value to a function, such as a struct or an array? In this scenario, we'll still be using the simple integer arguments that web assembly provides, but we'll be passing a pointer to the larger value instead of the value itself.
Let's look at an example: