Urjasvi Suthar

Webassembly: Introduction

In short, WebAssembly is a binary instruction format for a stack-based VM. To understand it better, we will talk a little bit more about what a Binary Instruction Format means and what a Stack-based VM means.

It was designed for improving performance of web applications and allowing them to do computationally expensive tasks that were previously not possible with JS. It is a portable format, which can work in environments outside of browsers such as desktop/mobile apps, client/server applications, IoTs, and even embedded devices.

Before WASM, asm.js existed that compiled statically-typed manual memory management languages such as C to optimized versions of JS. It used a source-to-source compiler that targeted a specific subset of JS. This subset consisted of static types and virtually no garbage collection, which improved performance by magnitudes. It also optimized size by employing techniques like dead-code elimination, and also removed unnecessary whitespace, newlines, etc. and shortened long identifiers to few-character identifiers.

What is Garbage Collection

Basically garbage means allocated memory that isn’t used anymore, and collection means de-allocating this not-used-anymore memory. In manual memory management languages, you allocate and deallocate memory by yourself. But in Garbage Collected languages, the language does it for you. How GC does it and when depends upon the implemented technique, you wouldn’t want GC to do its work at inconvenient times in inconvenient ways.

Binary Instruction Format

Just like humans use languages to understand and communicate with each other, computers need something similar. Human languages consist of atomic symbols such as alphabets, just as machine language consists of 0s and 1s. These symbols randomly strung together, by themselves, don’t mean anything. Just like kdjhsk doesn’t mean anything, random 0s and 1s 01010101 don’t mean anything. Language defines rules for these symbols and strings to give meaning, like hello means greeting in the English language. Zeros and Ones can be given meaning with a language designed particularly for a machine.

For the sake of understanding, we can design our own machine language. Let’s say, our language is based on bit strings of length 8-bit (called a byte), which is analogous to a word in the English language (although a word can be of arbitrary length). To understand a sentence, we say that a string of bytes starts with a verb and then, if any, we have a noun. For example, a byte string "10100000 00100101", 10100000 is a verb and 00100101 is a noun. Similarly, "10010011 00100100 00010000 10000001" is verb, noun, verb and noun respectively. Then we can go and add rules for what a verb and what a noun is. For example,

ByteMeaning
00______Walk
01______Run
10______Sprint
11______Stop

Then we can define the next 2-bits for vocal cords, and then the next-next 2-bits and so on and on. For nouns, it could mean human names somehow mapped to a byte. This will effectively give us a way to command any (actually up to 2^8 = 255 here) humans in our machine language.

There already exist machine languages for computers, these languages are sometimes called Instruction Set Architecture (ISA), such as x86_64, RISC-V, ARM64, etc. WebAssembly is another one of these languages, but for a made-up hardware.

Stack-based VM

Virtual Machine (VM) as the name implies, is a machine that is virtual and not physical hardware. VMs are software that emulate a specific machine, typically they emulate ISAs. These work by decoding the binary files (executables), then they try to make sense of the structure, and then execute appropriate similar instructions that are available natively.

Stack-based means that it uses a stack as storage for immediate temporary values instead of registers like ISAs. For example, you want to execute the following line of code:

printf(1 + 2);

The order of instructions and their execution look something like this:

InstructionOperationStack
1push 11
2push 21, 2
3add3
4call printfnull

Binary Instruction Format for Stack-based VM

So, WebAssembly is just a portable specification developed for a machine that uses a stack, whose main goal is to be used on the Web. But since it is just a format, it can be used anywhere as long as the VM is compliant and provides an interface for the environment (just like how ISAs interface with OS environments through function calls and other stuff).

Use Cases

CPU Rasterizer From Scratch: Part 2
WebAssembly: WASM vs JS