Streaming Nanopublications with Jelly: the hidden costs of serialization

10 Feb 2026

CEO & Co-Founder

Karolina Bogacka

The “datacenter tax”

One of the recurring messages in Martin Kleppmann’s "Designing Data-Intensive Applications” – the seminal book on how to design large-scale systems – is that the hard part isn’t picking a technology. It’s understanding its trade-offs well enough to make deliberate choices, because negligible overheads have a habit of turning into big costs once you scale.

Serialization is a perfect example of this sort of trade-off that is very easy to underestimate. Every time data crosses a boundary – between processes, over the network, onto disk, into logs – you pay to translate in-memory objects into bytes and back. In small systems, that cost can be hard to notice. At scale, it adds up into a persistent "datacenter tax”, consuming CPU and time across the entire system.

That "tax” isn’t just a catchy metaphor. Google created Protocol Buffers in part because obvious encodings like XML were too expensive at datacenter scale. And even with a compact, schema-driven binary format, serialization and deserialization can still be large enough to matter. In Google’s fleet-wide profiling, Protobuf operations constituted 9.6% of all CPU cycles in their infrastructure, motivating Google to research dedicated hardware accelerators for Protobuf.

Communication always has a cost. The only real choice is whether that cost stays hidden and compounding, or becomes something that you control. In the system I’ll describe, serialization overhead became the limiting factor for replication in a decentralized, knowledge graph-based system. The fix did not come from a single missing parser tweak, but a change in the whole workload shape: batching what used to be thousands of tiny transfers and moving to a streaming-friendly communication format.

The shape of the problem: tiny items, huge volume

Let us first set the scene and introduce the system in question.

The Nanopublication Network is an ecosystem for publishing scientific results as FAIR, machine-actionable research artifacts – this initiative is led by Knowledge Pixels. Each nanopublication is a small, self-contained knowledge-graph "micro-publication” that can represent virtually any kind of scientific statement – say, a link between a gene and a disease – along with the context needed to trace and reuse it. In practice, nanopublications are expressed as RDF datasets with a stable internal structure: an assertion (the claim), provenance (how that claim was produced), and publication information (metadata about the nanopublication itself), tied together by a head graph. That design makes nanopubs easy to cite, combine, and query, and it lets downstream tools work directly with granular, structured claims instead of forcing humans to extract them from long narrative papers.

An example nanopublication.

Operationally, the network is intentionally decentralized. Instead of a single central database, it relies on a set of cooperating services. Nanopublications are published and retrieved via the Nanopub Registry, queried through Nanopub Query (which exposes SPARQL endpoints over different subsets of the corpus), and explored with client applications such as Nanodash. And because the corpus is fundamentally composed of "many tiny items,” the largest communication overhead isn’t caused by interactive browsing, but bulk ingestion and synchronization (services pulling and exchanging large batches of nanopublications to keep replicas and query indexes up to date).

That’s where representation choices start to matter, with the community’s recommended TriG syntax being excellent for readability, but unexpectedly expensive when placed directly in the replication hot path.

Where throughput goes to die: per-item overhead

Before NeverBlink began working on the problem, nanopublications were listed on HTML/JSON pages that linked to individual entries. Each nanopub was served as a separate TriG file, and replicating 60,000 nanopubs meant making over 60,000 separate HTTP requests. This is the kind of design that is simple and reasonable until you scale it, where per-item overhead is invisible in small tests but becomes dominant once you multiply it by huge cardinality.

Nanopublications encoded as HTML/JSON list pages with an individual nanopub shown as TriG.

In practice, you end up paying twice. First, there’s the per-request tax (TLS/session handling, headers, routing, queueing, framework overhead, and application-level framing), which is repeated tens of thousands of times. Then there’s the per-document parsing tax: every TriG file triggers parser initialization, prefix map setup, UTF-8 decoding, allocations, and object churn. Again, all of these actions get repeated tens of thousands of times. Because each nanopub is small, those fixed costs don’t amortize across a large payload; they are the payload, from the system’s perspective.

And TriG isn’t slow here simply because it’s "text.” It’s slow here because its expressive, compact syntax comes with parsing overhead. The richer grammar requires more parser state and branching, abbreviations and syntactic sugar need prefix resolution and expansion, and both tend to generate extra allocations and string handling. For a few large files, this overhead fades into the background, but for 60,000 tiny ones you get 60,000 times the overhead. This is an insidious trap for the developer who expects to be bottlenecked by network or storage, but not by the CPU.

Jelly’s real trick: from files to messages

The solution presented by NeverBlink was Jelly: a binary RDF format built on Protobuf, published as an open specification with an open source implementation. It comes with eye-catching performance claims – roughly 2 times faster writing, 12 times faster reading with payloads around 6 times smaller than N-Triples in Jena. Those numbers matter, but they’re not the only part of the story. Jelly’s real differentiator isn’t just that it’s "binary” and efficiently implemented, but that it treats all data as a stream.

Instead of treating RDF as a collection of standalone files, Jelly encodes it as a framed stream – a sequence of messages that can carry triples, quads, graphs, or entire datasets – so receivers can start processing immediately, without waiting for the whole payload. And because you’re shipping one continuous stream rather than thousands of tiny files, the encoder can take advantage of repetition across messages, improving both compactness and compression. That’s especially valuable for nanopublications, where the same IRIs, prefixes, and structural patterns appear again and again. In short, Jelly is built to reduce per-item overhead as the stream progresses.

Diagram illustrating the HTTP request pattern used to download nanopublications.

And that’s what really changed the system. The breakthrough did not come from simply using a binary format, but from eliminating the 60,000-request pattern altogether. With Jelly, the client was able to request a subset once and get back a single HTTP response.

In full-scale deployment, the results spoke for themselves: fetching more than 60,000 nanopublications (about 1.4 million quads) dropped from more than 3 hours to under 4 seconds. That change effectively removed communication as a limiting factor, as service-to-service transfer now accounts for less than 1% of total system time, and with that bottleneck gone, the network has far more headroom to grow toward the scale you’d expect from a global scientific knowledge graph.

Benchmarking results. Platform: Oracle GraalVM 24+36.1, RDF4J 5.1.4, Jelly-JVM 2.10.3, Ryzen 9 7900 5.0 GHz, 64 GB RAM, Dataset: 10M nanopublications (RiverBench: nanopubs)

To separate protocol overhead from pure encoding/decoding performance, we also benchmarked raw serialization and deserialization throughput. In that setup, Jelly serialization reached 7.33 million triples per second (more than 11 times faster than TriG), while deserialization climbed as high as 15.23 million triples per second (more than 33 times faster than TriG).

The takeaway: change the workload, not just the code

It’s fair to ask: couldn’t we just pipeline the HTTP requests, add parallelism, or lean on response caching? Sometimes that helps, but in a decentralized ecosystem those fixes tend to come with real drawbacks. They increase system complexity, they rely on client behavior you can’t reliably standardize across the network, and more often than not they simply shift the overhead to a different layer instead of removing it. Jelly’s streaming changes the default approach. It makes "many tiny things” into a first-class API shape, lets consumers process incrementally as bytes arrive, and enables reuse and compression across items – advantages you simply don’t get when you fetch nanopubs one by one.

TriG still has an important place: it’s readable, debuggable, and friendly for tooling and human workflows. The problem starts when a human-friendly format becomes the hot path for a workload defined by massive cardinality. In distributed systems, the most expensive work is often the work you repeat at boundaries. If you want to pay less tax, cross fewer boundaries – and when your system naturally produces lots of small items, design around a stream of items as the primitive rather than treating each one as its own transaction.

You can find more details on streaming nanopublications with Jelly in our success story.