Performance

Loft offers four execution modes — interpreter, native (compiled to Rust via --native), WebAssembly (--native-wasm), and Rust (hand-written reference). The table below shows wall-clock time for ten micro-benchmarks run on the same machine, alongside CPython 3 for comparison.

Python 3 (CPython) loft interpreter loft native (rustc -O) loft wasm (wasm32-wasip2) Rust reference (rustc -O)
Benchmark Python loft interp loft native loft wasm Rust
01 fibonacci (recursive, n=38) ▶ Try
3395 ms
4819 ms
169 ms
257 ms
92 ms
02 sum loop (10 M integers)
66 ms
584 ms
15 ms
21 ms
8 ms
03 prime sieve (trial division, n=100 000) ▶ Try
49 ms
141 ms
4 ms
6 ms
4 ms
04 Collatz lengths (1 .. 1 M) ▶ Try
7393 ms
14 379 ms
334 ms
599 ms
149 ms
05 Mandelbrot (200×200, 256 iter) ▶ Try
135 ms
344 ms
7 ms
10 ms
6 ms
06 Newton sqrt (1 M calls)
1481 ms
3437 ms
159 ms
159 ms
152 ms
07 string build (500 K appends)
70 ms
61 ms
33 ms
68 ms
23 ms
08 word frequency (hash map)
46 ms
169 ms
32 ms
60 ms
2 ms
09 dot product (5 M floats)
158 ms
428 ms
36 ms
86 ms
3 ms
10 insertion sort (3 000 integers)
131 ms
291 ms
29 ms
56 ms
4 ms

Measured on a single core, Linux x86-64. Times are wall clock, best of one warm run. Run bench/run_bench.sh from the project root to reproduce.

Key takeaways

The four loft execution modes have very different performance profiles depending on what the program does.

loft interpreter vs Python

The loft interpreter runs your program directly without a separate compilation step. For tight numeric loops it is slower than CPython — typically 1.4–10× — because CPython's core is written in highly-optimised C while loft's is Rust and must also determine the type of each value before acting on it. The gap is largest for integer-heavy workloads (sum loop: 9×). For float arithmetic the gap narrows (Mandelbrot: 2.5×). Notably, string building is faster in the loft interpreter (61 ms vs Python's 70 ms) because loft's format-string concatenation creates fewer temporary objects.

loft native vs Rust

The native pipeline compiles loft source to Rust via --native or --native-release, then invokes rustc -O. For pure floating-point workloads the generated Rust is essentially as fast as hand-written Rust (Newton sqrt: 159 ms vs 152 ms, Mandelbrot: 7 ms vs 6 ms). For integer arithmetic the gap is 1.8–2.5×, and for data-structure workloads (word count, matrix, sort) the gap is 7–16×. The bottleneck in those cases is the codegen_runtime layer: the generated code calls runtime helpers (hash lookup, text comparison, vector index) that carry more overhead than the idiomatic Rust equivalents.

loft wasm vs loft native

WebAssembly adds a modest 1.5–2× overhead over native for most workloads. The exception is floating-point throughput: Newton sqrt runs at identical speed in wasm and native (both 159 ms), because the bottleneck is the FPU, not the wasm runtime. String building is slower in wasm (68 ms) than native (33 ms) due to wasm's memory model for dynamic strings. Wasm is a good target when native compilation is unavailable — it runs everywhere wasmtime or a browser is installed.

Current bottlenecks

Interpreter — bytecode dispatch overhead The interpreter executes one instruction at a time and must check the type of each value before every operation. There is no just-in-time compiler (JIT) or other technique to speed up code that runs repeatedly. Each iteration of a tight loop pays this per-instruction cost, which is the dominant overhead for sum-loop and Collatz.
Interpreter — long arithmetic uses i64 always The Collatz benchmark uses long for range safety. Loft's long arithmetic goes through a separate opcode path with additional null-sentinel checks, making it roughly 2× slower than integer on the interpreter. The native path closes this gap (334 ms vs 149 ms for Rust).
Native — codegen_runtime helper overhead for data structures Hash lookup (word_count: 32 ms native vs 2 ms Rust = 16×), vector indexing (matrix_mul: 36 ms vs 3 ms = 12×), and sort (29 ms vs 4 ms = 7×) all go through src/codegen_runtime.rs helpers. These functions perform bounds checks, null-sentinel tests, and store-pointer indirections that hand-written Rust avoids. Eliminating these indirections for simple in-memory collections is a planned optimisation.
Native — function call overhead for recursive code The recursive Fibonacci benchmark is 1.8× slower in native loft (169 ms) than hand-written Rust (92 ms). The generated code passes stores and additional runtime context through every call frame. Reducing per-call overhead for pure functions that do not touch the heap is a planned optimisation.
wasm — dynamic string memory model String building in wasm (68 ms) is roughly 2× slower than native (33 ms). Dynamic strings in the wasm build are heap-allocated inside the linear memory model with an extra indirection layer compared to native Rust String. This is a structural limitation of the wasm target; the loft wasm build prioritises compatibility over raw string throughput.

What is planned