Performance

Benchmarked on i5-10400F, Go 1.24:

Benchmarks

Operation	Latency	Allocs
Lex (simple query)	304 ns/op	2
Lex (full query)	945 ns/op	2
Parse (simple query)	477 ns/op	4
Parse (full query)	1,470 ns/op	8

Design Decisions

Lexer — O(1) Keyword Lookup

The lexer uses a stack-allocated buffer and O(1) keyword table lookup. It never heap-allocates to identify keywords, keeping tokenization cost flat regardless of input length.

Parser — Zero-Allocation Comparison

The Pratt parser uses byte-level asciiEqual / asciiEqualLower comparisons instead of strings.EqualFold or reflect-based logic. This avoids allocations in the hot path.

Filters — Type-Switch, No Reflect

Filter predicate evaluation uses explicit type-switch dispatch. The reflect package is never called in the filter conversion path, which keeps the per-query filter cost predictable.

Sparse BM25 — Atomic Cache

BM25 parameters (k1, b, avgdl) are cached with atomic.Pointer so concurrent queries never block on a mutex to read embedding configuration.

Pipeline — Cached Options

The pipeline caches buildDocumentOptions across requests. The embedding client uses http.Client{Timeout: 30s} instead of http.DefaultClient to prevent runaway connections.

Network Efficiency

BatchQuery — Single Round-Trip

When running multiple QUERY statements, use BatchQuery to send them all in a single QueryBatchPoints call to Qdrant:

results, _ := qql.BatchQuery(ctx, client, []string{
    "QUERY 'emergency triage' FROM docs LIMIT 5",
    "QUERY 'cardiac arrest' FROM docs LIMIT 5",
    "QUERY 'neurological assessment' FROM docs LIMIT 5",
})
// All 3 queries in one round-trip

This is 3–5× faster than sequential execution for pure QUERY batches.

Gateway — Auto-Detect Batch

The gateway auto-detects when all queries in an ExecBatch call are pure QUERY statements and routes them through Qdrant's native QueryBatch API automatically.