Opensourcing NVIDIA cuVS NodeJS bindings

Apr 13, 2026

We built Node.js bindings for NVIDIA cuVS. Today we open-source them.

NVIDIA cuVS is the GPU-accelerated vector search library at the center of their enterprise AI strategy. At GTC 2026, Jensen Huang called structured data “the foundation of trustworthy AI.”

cuVS is being integrated into Elasticsearch, Weaviate, Milvus, Oracle, Apache Lucene, OpenSearch, and FAISS.

cuVS has official bindings for C, C++, Python, Rust, Java, and Go.

Problem was, it had zero presence in the Node.js ecosystem. We fixed that.

What we built

cuvs-node gives Node.js developers direct access to GPU-accelerated vector search. Native C++ bindings to the cuVS C API via N-API. No Python subprocess, no microservice, no managed vector database. In-process, on GPU.

Five algorithms, covering every major vector search strategy:

CAGRA - GPU-native graph-based ANN. The flagship algorithm in cuVS. Best general-purpose approximate nearest neighbor search on GPU.
IVF-Flat - Inverted file index with uncompressed lists. Fast to build, exact distances within probed lists.
IVF-PQ - Inverted file with product quantization. Lower memory footprint for very large datasets.
Brute-force - Exact nearest neighbor search. Ground truth baseline.
HNSW - CPU-side graph search, built by converting a GPU CAGRA index. Build on GPU for speed, serve on CPU for cost.

Show me the code

const { Resources, CagraIndex } = require('cuvs-node')

const res = new Resources()

// Build an index from 10K vectors, 128 dimensions
const dataset = new Float32Array(10000 * 128)
for (let i = 0; i < dataset.length; i++) dataset[i] = Math.random()
const index = CagraIndex.build(res, dataset, { rows: 10000, cols: 128 })

// Search for 10 nearest neighbors
const queries = new Float32Array(3 * 128)
for (let i = 0; i < queries.length; i++) queries[i] = Math.random()
const { indices, distances } = index.search(res, queries, { rows: 3, cols: 128, k: 10 })

// Save and reload
index.serialize(res, './my-index.bin')
const loaded = CagraIndex.deserialize(res, './my-index.bin')

res.dispose()

That’s it. Build an index, search it, save it, load it. Ten lines.

Performance

All benchmarks on Lambda.ai infrastructure, 128-dimensional float32 vectors, CAGRA algorithm.

Index build (100,000 vectors)

GPU	VRAM	Time	Throughput
A10	24GB	1,225ms	81,700 vectors/sec
A100 SXM	40GB	541ms	184,700 vectors/sec
GH200	96GB	211ms	474,200 vectors/sec

Search (100 queries, k=10, 100K vector index)

GPU	VRAM	Latency	Throughput
A10	24GB	1.4ms	71,900 queries/sec
A100 SXM	40GB	1.3ms	77,000 queries/sec
GH200	96GB	0.8ms	121,600 queries/sec

Sub-millisecond search on GH200. Under 1.5ms even on a budget A10.

GPU vs CPU: 733x faster index builds

We benchmarked cuvs-node against hnswlib-node, the most popular CPU vector search library for Node.js. Same machine, same data, same Node.js runtime. The only difference: GPU (CAGRA) vs CPU (HNSW).

Hardware: NVIDIA A100 SXM 40GB + AMD EPYC 7J13 30 vCPU, on Lambda.ai.
(We ran on many other GPUs and infrastructure providers, Lambda.ai is who we selected for baseline numbers.)

Index build time

Vectors	Dimensions	GPU (cuvs-node)	CPU (hnswlib-node)	Speedup
100K	128	0.6s	60s	100x
250K	128	1.1s	3.3min	183x
500K	128	1.8s	8.0min	263x
1M	128	3.4s	17.4min	303x
5M	128	17.3s	107.5min	373x
10M	128	35.5s	232.5min	393x
100K	768	1.1s	4.9min	267x
250K	768	2.0s	14.0min	431x
500K	768	3.1s	30.6min	600x
1M	768	5.3s	65.2min	733x

The gap widens with scale and dimensionality. At 1M vectors with 768 dimensions (a common embedding size for production workloads), the GPU builds the index in 5.3 seconds. The CPU takes over an hour.

Search latency

Vectors	Dimensions	GPU (cuvs-node)	CPU (hnswlib-node)
1M	128	1.5ms	28.1ms
1M	768	2.1ms	88.6ms
5M	128	1.5ms	33.7ms

GPU search stays under 2.5ms regardless of scale. CPU search degrades as the index grows.

99 tests, five GPU types

The full test suite covers all five algorithms: build correctness, search result validation (index ranges, distance ordering, self-search accuracy), serialize/deserialize round-trips, input rejection, and benchmark stability across scales from 10K to 100K vectors. 99 tests. All passing.

Verified on five NVIDIA GPU types: A10, A100, H100, GH200, and B200.

Why this matters

The Node.js ecosystem has over 2 million packages and millions of active developers. GPU-accelerated vector search was locked behind Python, Rust, or managed services like Pinecone and Weaviate. If your backend was Node.js, your options were:

Add Python to your stack (complexity, deployment overhead)
Call a managed vector database over the network (latency, cost, vendor lock-in)
Use a JS-only ANN library like hnswlib-node (CPU-bound, orders of magnitude slower)

Now there is a fourth option: in-process GPU vector search, native to Node.js. Build indexes at 474K vectors/sec. Search in under a millisecond. No Python. No network hop. No managed service.

Build on GPU, serve on CPU. The CAGRA-to-HNSW conversion means you can build your index on a GPU instance and serve queries from a CPU-only deployment. GPU for the heavy lifting, CPU for the serving cost.

What is next

cuvs-index - a schema-driven query engine built on top of cuvs-node. Define entities and fields, plug in a storage adapter (DynamoDB, MongoDB), and run hybrid queries that combine structured filters with vector similarity. Already on npm and GitHub.

TypeScript types - full type definitions for IDE autocomplete and type checking.

Prebuilt binaries - so users can npm install without compiling from source.

Get started

GitHub: github.com/638Labs/cuvs-node
npm: npm install cuvs-node
License: Apache-2.0

Built by 638Labs.