Skip to content

vectorsearch

1 post with the tag “vectorsearch”

Opensourcing NVIDIA cuVS NodeJS bindings

We built Node.js bindings for NVIDIA cuVS. Today we open-source them.

NVIDIA cuVS is the GPU-accelerated vector search library at the center of their enterprise AI strategy. At GTC 2026, Jensen Huang called structured data “the foundation of trustworthy AI.”

cuVS is being integrated into Elasticsearch, Weaviate, Milvus, Oracle, Apache Lucene, OpenSearch, and FAISS.

cuVS has official bindings for C, C++, Python, Rust, Java, and Go.

Problem was, it had zero presence in the Node.js ecosystem. We fixed that.

What we built

cuvs-node gives Node.js developers direct access to GPU-accelerated vector search. Native C++ bindings to the cuVS C API via N-API. No Python subprocess, no microservice, no managed vector database. In-process, on GPU.

Five algorithms, covering every major vector search strategy:

  • CAGRA - GPU-native graph-based ANN. The flagship algorithm in cuVS. Best general-purpose approximate nearest neighbor search on GPU.
  • IVF-Flat - Inverted file index with uncompressed lists. Fast to build, exact distances within probed lists.
  • IVF-PQ - Inverted file with product quantization. Lower memory footprint for very large datasets.
  • Brute-force - Exact nearest neighbor search. Ground truth baseline.
  • HNSW - CPU-side graph search, built by converting a GPU CAGRA index. Build on GPU for speed, serve on CPU for cost.

Show me the code

const { Resources, CagraIndex } = require('cuvs-node')
const res = new Resources()
// Build an index from 10K vectors, 128 dimensions
const dataset = new Float32Array(10000 * 128)
for (let i = 0; i < dataset.length; i++) dataset[i] = Math.random()
const index = CagraIndex.build(res, dataset, { rows: 10000, cols: 128 })
// Search for 10 nearest neighbors
const queries = new Float32Array(3 * 128)
for (let i = 0; i < queries.length; i++) queries[i] = Math.random()
const { indices, distances } = index.search(res, queries, { rows: 3, cols: 128, k: 10 })
// Save and reload
index.serialize(res, './my-index.bin')
const loaded = CagraIndex.deserialize(res, './my-index.bin')
res.dispose()

That’s it. Build an index, search it, save it, load it. Ten lines.

Performance

All benchmarks on Lambda.ai infrastructure, 128-dimensional float32 vectors, CAGRA algorithm.

Index build (100,000 vectors)

GPUVRAMTimeThroughput
A1024GB1,225ms81,700 vectors/sec
A100 SXM40GB541ms184,700 vectors/sec
GH20096GB211ms474,200 vectors/sec

Search (100 queries, k=10, 100K vector index)

GPUVRAMLatencyThroughput
A1024GB1.4ms71,900 queries/sec
A100 SXM40GB1.3ms77,000 queries/sec
GH20096GB0.8ms121,600 queries/sec

Sub-millisecond search on GH200. Under 1.5ms even on a budget A10.

GPU vs CPU: 733x faster index builds

We benchmarked cuvs-node against hnswlib-node, the most popular CPU vector search library for Node.js. Same machine, same data, same Node.js runtime. The only difference: GPU (CAGRA) vs CPU (HNSW).

Hardware: NVIDIA A100 SXM 40GB + AMD EPYC 7J13 30 vCPU, on Lambda.ai.
(We ran on many other GPUs and infrastructure providers, Lambda.ai is who we selected for baseline numbers.)

Index build time

VectorsDimensionsGPU (cuvs-node)CPU (hnswlib-node)Speedup
100K1280.6s60s100x
250K1281.1s3.3min183x
500K1281.8s8.0min263x
1M1283.4s17.4min303x
5M12817.3s107.5min373x
10M12835.5s232.5min393x
100K7681.1s4.9min267x
250K7682.0s14.0min431x
500K7683.1s30.6min600x
1M7685.3s65.2min733x

The gap widens with scale and dimensionality. At 1M vectors with 768 dimensions (a common embedding size for production workloads), the GPU builds the index in 5.3 seconds. The CPU takes over an hour.

Search latency

VectorsDimensionsGPU (cuvs-node)CPU (hnswlib-node)
1M1281.5ms28.1ms
1M7682.1ms88.6ms
5M1281.5ms33.7ms

GPU search stays under 2.5ms regardless of scale. CPU search degrades as the index grows.

99 tests, five GPU types

The full test suite covers all five algorithms: build correctness, search result validation (index ranges, distance ordering, self-search accuracy), serialize/deserialize round-trips, input rejection, and benchmark stability across scales from 10K to 100K vectors. 99 tests. All passing.

Verified on five NVIDIA GPU types: A10, A100, H100, GH200, and B200.

Why this matters

The Node.js ecosystem has over 2 million packages and millions of active developers. GPU-accelerated vector search was locked behind Python, Rust, or managed services like Pinecone and Weaviate. If your backend was Node.js, your options were:

  1. Add Python to your stack (complexity, deployment overhead)
  2. Call a managed vector database over the network (latency, cost, vendor lock-in)
  3. Use a JS-only ANN library like hnswlib-node (CPU-bound, orders of magnitude slower)

Now there is a fourth option: in-process GPU vector search, native to Node.js. Build indexes at 474K vectors/sec. Search in under a millisecond. No Python. No network hop. No managed service.

Build on GPU, serve on CPU. The CAGRA-to-HNSW conversion means you can build your index on a GPU instance and serve queries from a CPU-only deployment. GPU for the heavy lifting, CPU for the serving cost.

What is next

cuvs-index - a schema-driven query engine built on top of cuvs-node. Define entities and fields, plug in a storage adapter (DynamoDB, MongoDB), and run hybrid queries that combine structured filters with vector similarity. Already on npm and GitHub.

TypeScript types - full type definitions for IDE autocomplete and type checking.

Prebuilt binaries - so users can npm install without compiling from source.

Get started

Built by 638Labs.