Back to blog
May 14, 2020
5 min read

Sending JSON across the Wire

Does knowing how an int is stored actually matter?

Sending JSON Across the Wire: How It Works and Why Size Matters

JSON. Curly braces, key value pairs. Easy stuff, right? But when you send it over the network, it doesn’t just magically teleport as an object. There’s an easy part, and a tricky part.

Serialization: Turning Objects into Strings

So you’ve got an object in JavaScript:

const user = {
  name: "cooliver",
  age: 100,
  isAdmin: true
};

Serialization walks through each property and value, turning it into a string that represents the object:

{"name":"cooliver","age":100,"isAdmin":true}

Cool. Now it’s a string that can be sent across the wire. This is the easy part. JSON.org will show you the full spec, but the point is, it’s just taking your object and converting it into a text format that represents the same data. Most developers know this.

I just want something reliable

Sending data over the network isn’t just about slapping your JSON string on a carrier pigeon. We’ve got TCP* handling the heavy lifting.

*Yes, a lot of internet uses UDP too but it’s not reliable - it’s connectionless whereas TCP is connection-oriented. UDP is faster but suffers a lot of packet loss since there’s no guarantee of delivery, order or error checking.

TCP is reliable — if you send a payload, it gets there in order, no data lost. But there’s a catch. TCP makes sure they all arrive correctly, but how do you know when you’ve got the whole JSON object? How do you know where it ends?

Now, TCP itself doesn’t understand JSON or care about your curly braces {}. It’s just moving binary data—1s and 0s—across the network. It doesn’t inherently understand where your JSON string begins or ends.

One fundamental challenge when sending data like JSON is knowing how much data to expect. Suppose you send a JSON object that spans multiple packets. How does the receiving system know when it has received the entire object? Relying on something like curly braces {} to determine the start and end of the JSON is not practical, as this isn’t how TCP functions at the binary level. Instead, protocols that sit on top of TCP, like HTTP, use headers to define content length. A typical approach includes a small header before the actual data, which indicates how many bytes of data follow. For example:

[version, length, { ... }]

Version: Indicates the protocol version. Length: Specifies the size of the following JSON data in bytes. Knowing the length ensure the receiver knows how much data to expect and when to stop reading.

Length matters

The size of the JSON string directly impacts how efficiently data can be transmitted. Larger JSON objects mean more packets, which can lead to increased transmission time and potential bottlenecks. Knowing the exact size of your data allows the receiving end to handle the data correctly, especially when dealing with large, complex objects.

Moreover, because TCP can split and reassemble packets in arbitrary ways, it’s possible to receive parts of your data out of sequence. The length information helps the receiving application reassemble the data correctly, preventing errors and ensuring that the JSON can be deserialized back into its original object form. We need a way to signal the start and end of our data.

Let’s look at how HTTP handles it:

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 53

{"name":"cooliver","age":100,"isAdmin":true}

The headers end with \r\n (carriage return, newline), and another \r\n tells the receiver, “This is where the body starts.” The Content-Length header says exactly how many bytes are coming in the body, so the receiver knows when to stop reading. It’s a nice little system that keeps things clean and ordered.

So, the larger the JSON, the more packets needed to send it, and the longer it takes to transmit and reassemble on the other end. Knowing the size upfront means you can parse it correctly without guessing. Reliability.

Does knowing how an int is stored actually matter?

If you know how an integer is stored in memory, then learning how data is sent over TCP is neither unfamiliar or trivial to you. Technologies are getting abstracted away at an increasingly rapid pace. Glue engineering is becoming much more common practice (even wise if you’re looking to ship fast). But, understanding things on the low level is super useful. If you know the basics, like how memory works or how data is structured, all this higher-level stuff becomes way easier. It’s like learning how to learn. So I’m gonna go build my own transfer protocol now. Should be fun.