Opensourcing NVIDIA cuVS NodeJS bindings
We built Node.js bindings for NVIDIA cuVS. Today we open-source them.
NVIDIA cuVS is the GPU-accelerated vector search library at the center of their enterprise AI strategy. At GTC 2026, Jensen Huang called structured data “the foundation of trustworthy AI.”
cuVS is being integrated into Elasticsearch, Weaviate, Milvus, Oracle, Apache Lucene, OpenSearch, and FAISS.
cuVS has official bindings for C, C++, Python, Rust, Java, and Go.
Problem was, it had zero presence in the Node.js ecosystem. We fixed that.
What we built
cuvs-node gives Node.js developers direct access to GPU-accelerated vector search. Native C++ bindings to the cuVS C API via N-API. No Python subprocess, no microservice, no managed vector database. In-process, on GPU.
Five algorithms, covering every major vector search strategy:
- CAGRA - GPU-native graph-based ANN. The flagship algorithm in cuVS. Best general-purpose approximate nearest neighbor search on GPU.
- IVF-Flat - Inverted file index with uncompressed lists. Fast to build, exact distances within probed lists.
- IVF-PQ - Inverted file with product quantization. Lower memory footprint for very large datasets.
- Brute-force - Exact nearest neighbor search. Ground truth baseline.
- HNSW - CPU-side graph search, built by converting a GPU CAGRA index. Build on GPU for speed, serve on CPU for cost.
Show me the code
const { Resources, CagraIndex } = require('cuvs-node')
const res = new Resources()
// Build an index from 10K vectors, 128 dimensionsconst dataset = new Float32Array(10000 * 128)for (let i = 0; i < dataset.length; i++) dataset[i] = Math.random()const index = CagraIndex.build(res, dataset, { rows: 10000, cols: 128 })
// Search for 10 nearest neighborsconst queries = new Float32Array(3 * 128)for (let i = 0; i < queries.length; i++) queries[i] = Math.random()const { indices, distances } = index.search(res, queries, { rows: 3, cols: 128, k: 10 })
// Save and reloadindex.serialize(res, './my-index.bin')const loaded = CagraIndex.deserialize(res, './my-index.bin')
res.dispose()That’s it. Build an index, search it, save it, load it. Ten lines.
Performance
All benchmarks on Lambda.ai infrastructure, 128-dimensional float32 vectors, CAGRA algorithm.
Index build (100,000 vectors)
| GPU | VRAM | Time | Throughput |
|---|---|---|---|
| A10 | 24GB | 1,225ms | 81,700 vectors/sec |
| A100 SXM | 40GB | 541ms | 184,700 vectors/sec |
| GH200 | 96GB | 211ms | 474,200 vectors/sec |
Search (100 queries, k=10, 100K vector index)
| GPU | VRAM | Latency | Throughput |
|---|---|---|---|
| A10 | 24GB | 1.4ms | 71,900 queries/sec |
| A100 SXM | 40GB | 1.3ms | 77,000 queries/sec |
| GH200 | 96GB | 0.8ms | 121,600 queries/sec |
Sub-millisecond search on GH200. Under 1.5ms even on a budget A10.
GPU vs CPU: 733x faster index builds
We benchmarked cuvs-node against hnswlib-node, the most popular CPU vector search library for Node.js. Same machine, same data, same Node.js runtime. The only difference: GPU (CAGRA) vs CPU (HNSW).
Hardware: NVIDIA A100 SXM 40GB + AMD EPYC 7J13 30 vCPU, on Lambda.ai.
(We ran on many other GPUs and infrastructure providers, Lambda.ai is who we selected for baseline numbers.)
Index build time
| Vectors | Dimensions | GPU (cuvs-node) | CPU (hnswlib-node) | Speedup |
|---|---|---|---|---|
| 100K | 128 | 0.6s | 60s | 100x |
| 250K | 128 | 1.1s | 3.3min | 183x |
| 500K | 128 | 1.8s | 8.0min | 263x |
| 1M | 128 | 3.4s | 17.4min | 303x |
| 5M | 128 | 17.3s | 107.5min | 373x |
| 10M | 128 | 35.5s | 232.5min | 393x |
| 100K | 768 | 1.1s | 4.9min | 267x |
| 250K | 768 | 2.0s | 14.0min | 431x |
| 500K | 768 | 3.1s | 30.6min | 600x |
| 1M | 768 | 5.3s | 65.2min | 733x |
The gap widens with scale and dimensionality. At 1M vectors with 768 dimensions (a common embedding size for production workloads), the GPU builds the index in 5.3 seconds. The CPU takes over an hour.
Search latency
| Vectors | Dimensions | GPU (cuvs-node) | CPU (hnswlib-node) |
|---|---|---|---|
| 1M | 128 | 1.5ms | 28.1ms |
| 1M | 768 | 2.1ms | 88.6ms |
| 5M | 128 | 1.5ms | 33.7ms |
GPU search stays under 2.5ms regardless of scale. CPU search degrades as the index grows.
99 tests, five GPU types
The full test suite covers all five algorithms: build correctness, search result validation (index ranges, distance ordering, self-search accuracy), serialize/deserialize round-trips, input rejection, and benchmark stability across scales from 10K to 100K vectors. 99 tests. All passing.
Verified on five NVIDIA GPU types: A10, A100, H100, GH200, and B200.
Why this matters
The Node.js ecosystem has over 2 million packages and millions of active developers. GPU-accelerated vector search was locked behind Python, Rust, or managed services like Pinecone and Weaviate. If your backend was Node.js, your options were:
- Add Python to your stack (complexity, deployment overhead)
- Call a managed vector database over the network (latency, cost, vendor lock-in)
- Use a JS-only ANN library like hnswlib-node (CPU-bound, orders of magnitude slower)
Now there is a fourth option: in-process GPU vector search, native to Node.js. Build indexes at 474K vectors/sec. Search in under a millisecond. No Python. No network hop. No managed service.
Build on GPU, serve on CPU. The CAGRA-to-HNSW conversion means you can build your index on a GPU instance and serve queries from a CPU-only deployment. GPU for the heavy lifting, CPU for the serving cost.
What is next
cuvs-index - a schema-driven query engine built on top of cuvs-node. Define entities and fields, plug in a storage adapter (DynamoDB, MongoDB), and run hybrid queries that combine structured filters with vector similarity. Already on npm and GitHub.
TypeScript types - full type definitions for IDE autocomplete and type checking.
Prebuilt binaries - so users can npm install without compiling from source.
Get started
- GitHub: github.com/638Labs/cuvs-node
- npm:
npm install cuvs-node - License: Apache-2.0
Built by 638Labs.