Loft offers four execution modes — interpreter, native (compiled to Rust via --native), WebAssembly (--native-wasm), and Rust (hand-written reference). The table below shows wall-clock time for ten micro-benchmarks run on the same machine, alongside CPython 3 for comparison.
| Benchmark | Python | loft interp | loft native | loft wasm | Rust |
|---|---|---|---|---|---|
| 01 fibonacci (recursive, n=38) ▶ Try | |||||
| 02 sum loop (10 M integers) | |||||
| 03 prime sieve (trial division, n=100 000) ▶ Try | |||||
| 04 Collatz lengths (1 .. 1 M) ▶ Try | |||||
| 05 Mandelbrot (200×200, 256 iter) ▶ Try | |||||
| 06 Newton sqrt (1 M calls) | |||||
| 07 string build (500 K appends) | |||||
| 08 word frequency (hash map) | |||||
| 09 dot product (5 M floats) | |||||
| 10 insertion sort (3 000 integers) |
Measured on a single core, Linux x86-64. Times are wall clock, best of one warm run. Run bench/run_bench.sh from the project root to reproduce.
Key takeaways
The four loft execution modes have very different performance profiles depending on what the program does.
loft interpreter vs Python
The loft interpreter runs your program directly without a separate compilation step. For tight numeric loops it is slower than CPython — typically 1.4–10× — because CPython's core is written in highly-optimised C while loft's is Rust and must also determine the type of each value before acting on it. The gap is largest for integer-heavy workloads (sum loop: 9×). For float arithmetic the gap narrows (Mandelbrot: 2.5×). Notably, string building is faster in the loft interpreter (61 ms vs Python's 70 ms) because loft's format-string concatenation creates fewer temporary objects.
loft native vs Rust
The native pipeline compiles loft source to Rust via --native or --native-release, then invokes rustc -O. For pure floating-point workloads the generated Rust is essentially as fast as hand-written Rust (Newton sqrt: 159 ms vs 152 ms, Mandelbrot: 7 ms vs 6 ms). For integer arithmetic the gap is 1.8–2.5×, and for data-structure workloads (word count, matrix, sort) the gap is 7–16×. The bottleneck in those cases is the codegen_runtime layer: the generated code calls runtime helpers (hash lookup, text comparison, vector index) that carry more overhead than the idiomatic Rust equivalents.
loft wasm vs loft native
WebAssembly adds a modest 1.5–2× overhead over native for most workloads. The exception is floating-point throughput: Newton sqrt runs at identical speed in wasm and native (both 159 ms), because the bottleneck is the FPU, not the wasm runtime. String building is slower in wasm (68 ms) than native (33 ms) due to wasm's memory model for dynamic strings. Wasm is a good target when native compilation is unavailable — it runs everywhere wasmtime or a browser is installed.
Current bottlenecks
i64 always
The Collatz benchmark uses long for range safety. Loft's long arithmetic goes through a separate opcode path with additional null-sentinel checks, making it roughly 2× slower than integer on the interpreter. The native path closes this gap (334 ms vs 149 ms for Rust).
codegen_runtime helper overhead for data structures
Hash lookup (word_count: 32 ms native vs 2 ms Rust = 16×), vector indexing (matrix_mul: 36 ms vs 3 ms = 12×), and sort (29 ms vs 4 ms = 7×) all go through src/codegen_runtime.rs helpers. These functions perform bounds checks, null-sentinel tests, and store-pointer indirections that hand-written Rust avoids. Eliminating these indirections for simple in-memory collections is a planned optimisation.
String. This is a structural limitation of the wasm target; the loft wasm build prioritises compatibility over raw string throughput.
What is planned
- Interpreter speedup — superinstruction merging for common opcode pairs (e.g. load + add + store in a loop); reduces dispatch count by 2–3× for arithmetic-heavy loops.
- Native data-structure inlining — for vectors and hashes that do not cross function boundaries, emit direct Rust
Vec/HashMapoperations instead ofcodegen_runtimehelper calls. - Native function call reduction — inline small pure functions at the IR level before native codegen.
- wasm string optimisation — use wasm-native string representation to close the 2× gap on string-building workloads.