Skip to content

Performance

Benchmarked on i5-10400F, Go 1.24:

OperationLatencyAllocs
Lex (simple query)304 ns/op2
Lex (full query)945 ns/op2
Parse (simple query)477 ns/op4
Parse (full query)1,470 ns/op8

The lexer uses a stack-allocated buffer and O(1) keyword table lookup. It never heap-allocates to identify keywords, keeping tokenization cost flat regardless of input length.

The Pratt parser uses byte-level asciiEqual / asciiEqualLower comparisons instead of strings.EqualFold or reflect-based logic. This avoids allocations in the hot path.

Filter predicate evaluation uses explicit type-switch dispatch. The reflect package is never called in the filter conversion path, which keeps the per-query filter cost predictable.

BM25 parameters (k1, b, avgdl) are cached with atomic.Pointer so concurrent queries never block on a mutex to read embedding configuration.

The pipeline caches buildDocumentOptions across requests. The embedding client uses http.Client{Timeout: 30s} instead of http.DefaultClient to prevent runaway connections.

When running multiple QUERY statements, use BatchQuery to send them all in a single QueryBatchPoints call to Qdrant:

results, _ := qql.BatchQuery(ctx, client, []string{
"QUERY 'emergency triage' FROM docs LIMIT 5",
"QUERY 'cardiac arrest' FROM docs LIMIT 5",
"QUERY 'neurological assessment' FROM docs LIMIT 5",
})
// All 3 queries in one round-trip

This is 3–5× faster than sequential execution for pure QUERY batches.

The gateway auto-detects when all queries in an ExecBatch call are pure QUERY statements and routes them through Qdrant's native QueryBatch API automatically.