KNN Search Complexity

Note

Dominant cost (exact): O(q * n * d) per aspect.

For the numpy and FAISS Flat backends the search is an exact brute-force nearest-neighbour retrieval. With q query vectors, n reference vectors of dimension d, the dot-product matrix is (q, n) and each entry requires d multiply-adds: O(q * n * d) total. For PROTEA’s v226 corpus (n ~ 530k, d = 480 for ESM-2 150M, q ~ 5k per aspect) this is roughly 1.3 * 10^12 FLOPs per KNN call, handled in chunks to cap RAM.

PROTEA’s KNN is a thin shim over protea_method.knn_search (extracted in the F2C.5 refactor). The shim lives at protea/core/knn_search.py; new code should import protea_method.knn_search.search_knn directly.

Backend selection

Three backends are available via the backend parameter of search_knn:

"numpy" (default for export)

Exact, chunked cosine or L2 via NumPy matrix multiplication. No extra dependencies. Query chunk size is controlled by PROTEA_METHOD_NUMPY_QUERY_CHUNK (default 500). A 500-query chunk against 530k references at float32 costs ~1 GB RAM, staying comfortably inside the export worker’s budget.

"faiss"

Wraps FAISS. "Flat" is exact (same asymptotic as numpy, ~3-10x faster via BLAS). "IVFFlat" and "HNSW" are approximate; not used in PROTEA production runs (the accuracy loss is unacceptable for the temporal-holdout evaluation).

"torch"

Chunked GPU/CPU KNN via torch.cdist + torch.topk. Recommended for production runs where PyTorch is already loaded in the process (e.g., LAFA containers). Device selection: PROTEA_KNN_DEVICE (default "auto"); chunk size: PROTEA_KNN_CHUNK_SIZE (default 4096).

Top-k selection optimisation

The numpy backend avoids a full O(n log n) argsort by using np.argpartition (O(n) per query) to isolate the top-k candidates, then sorting only that k-slice:

# Chunk-invariant work computed once, reused across all query chunks
R_T = R_ready.T
for start in range(0, n_queries, query_chunk):
    Q_chunk = Q[start : start + query_chunk]
    dist = 1.0 - (Q_n @ R_T)          # (chunk, n_refs) matrix
    # argpartition is O(n_refs) per row; ~30x faster than full argsort for k << n_refs
    part = np.argpartition(dist, k - 1, axis=1)[:, :k]
    # sort the k-slice only
    del dist

For k = 5 and n = 530k, argpartition is approximately 30x faster than a full sort on 500k-element arrays.

RAM usage

The (q, n) distance matrix is float32. For the default chunk of 500 queries and n = 530k: 500 * 530000 * 4 B = 1.06 GB. This is the dominant allocation in the export worker and is freed at the end of each chunk loop via del dist.

Design constraint

pgvector is explicitly excluded from KNN search in PROTEA (see docs/source/adr/001-knn-without-pgvector.rst). The hard rule is: for reference pools larger than 500k vectors, use numpy or FAISS only.

Cross-reference

Thesis Ch. 5.2 contains the measured throughput curve (queries/s vs n) comparing numpy chunked, FAISS Flat, and FAISS IVFFlat on the v226 corpus, and justifies the numpy-default decision.