pretext.lab

Methodology

This page documents how Experiment 01 was run. The test machine was a [MACHINE SPEC — e.g. MacBook Pro M4 Pro, N GB unified memory]:

  • macOS Tahoe 26.3
  • Chrome 148.0.7778.97 (arm64 native)
  • Firefox 150.0.1
  • Safari (system, build current at the time of measurement)

Runs are performed with no other foreground applications open beyond the benchmark page itself.

Run-loop construction

Each measurement uses two layers of indirection from a naïve for loop.

The first layer is warm-up: 500 untimed iterations of the operation under test, run before any timing begins. These exist to push the JIT into steady state, populate caches, and trigger any library-internal lazy initialisation. The 500 figure isn’t arbitrary — for the simplest operations (Pretext arithmetic) about 50 are enough; for branchy paths (DOM measurement) 200–500 stabilises run-over-run. 500 is the conservative default.

The second layer is batched timing. Browsers reduce performance.now() precision (Spectre mitigation) to ~100 µs in Chrome and ~1 ms in Firefox and Safari. Sub-microsecond operations are unmeasurable per call. Instead, batches of 1000 calls are timed collectively, and the batch time divided by 1000:

const BATCH = 1000
const t0 = performance.now()
for (let j = 0; j < BATCH; j++) sink += measure()
const perCall = (performance.now() - t0) / BATCH

The sink variable exists to prevent V8 from eliminating dead-code calls when the return value is unused. After the loop, sink is checked against an impossible value (0xDEADBEEF) so the optimiser is forced to keep the call.

Default config: warmup: 500, iterations: 1000, batchSize: 1000. Per-experiment overrides are documented at the top of each experiment page.

What gets reported

Each measurement reports three percentiles: p50 (median), p95, p99. Means are not reported — they hide tail behaviour, which for some operations is where the interesting story lives.

Cross-browser benchmarks always report three columns — Chrome (Blink), Firefox (Gecko), Safari (WebKit) — never an average. Browser engines have different cost models — averaging across them produces numbers that describe no real browser.

Build conditions

Every published number comes from a production build (pnpm build && pnpm preview). Dev-mode numbers are useful only for confirming the benchmark works at all; framework dev middleware adds non-uniform overhead that can double measurements on small inputs while disappearing on large ones (see Experiment 01 for the asymmetric case).

DevTools are kept fully closed during measurement. DevTools-open p99s can be 50–70× worse than DevTools-closed p99s for sub-microsecond operations (also documented in Experiment 01). When this matters, it’s called out per measurement.

Incognito mode is used optionally — on the test machine, the difference vs. a regular profile was within ±2%. With heavier extension loads this could differ; if a measurement looks suspicious, an incognito re-run is the first sanity check.

Out of scope (for now)

This lab measures single-call costs of layout-related operations across browsers. It does not currently measure:

  • Bundle-size impact of the libraries under test
  • Memory pressure under sustained load
  • Real-world rendering performance with frame-budget constraints
  • Server-side rendering and hydration paths
  • Mobile devices (any kind)
  • Accessibility-tree integrity

These boundaries are explicit so readers can decide whether the numbers here are load-bearing for their use case. Some of them may come into scope for later experiments — when they do, this page will note it.