Douglas Crockford wrote RFC 4627, describing the specifications for JSON, a “text format for serialization of structured data.” As a language-agnostic, human-readable open format that has native support for encoding/decoding in browsers, JSON has become the de facto standard for data serialization on the web. There are drawbacks to using JSON, which became evident when we started to write a networked game using WebSockets. (Check out our pre-alpha teaser if you didn’t get a chance to see us at PAX!)
The Setup
In our 3D game, players have a position and rotation that are updated every simulation step. I will refer to the pair of position and rotation together as a transform. The server notifies clients of these updated transforms over WebSockets. Here is an example of a transform we might have at a given moment:
Here, we represent the rotation using a quaternion, rather than a 3×3 matrix, because it has only four values versus the nine values in the matrix. When using socket.io’s emit functionality, the data arguments you provide are serialized using JSON.stringify and then sent out. Here is the previous transform serialized to JSON:
The resulting string is 190 characters long. So what does that mean for us?
The Problem
This is a multiplayer game, so let’s assume we have the bare minimum—two players. The resulting message that goes out has 380 characters, plus the 3 used by JSON for the array brackets and comma, giving us a total of 383 characters. Let us assume that our game runs at 30 steps per second on the server. We are transferring 11,490 bytes every second per player. If we assume a rate of 18¢/GB of data transfer from our server provider, we have 0.683¢ per hour of gameplay.
Unfortunately, we didn’t set out to make a two player game. If we assume a maximum of 12 players per game, we see a cost of roughly 4.15¢ per player per hour, a linear increase in cost per max player cap. This may prove prohibitive, so what can we do about this?
Observations
The data that the client receives from the server is not truly arbitrary. We know to expect as many transform values as there are players. We also know that a transform value consists of exactly seven floating-point values. We could represent a transform with an array of seven numbers that we would then process on the client to recreate the transform object. This slims our message down to 138 characters, and our costs for 12 players down to 3.02¢ per player per hour.
We immediately see that this is much less readable. Without context, this JSON string has very little meaning—the data is no longer structured. This is a tradeoff that we begin to see regarding optimizing the network traffic.
We also observe that double precision floating-point numbers are represented as up to 19 characters as a human-readable string. In binary, these are only 8 bytes. How do we use this information to our advantage?
A Solution
We can start to drill down into the binary representation of a floating-point number using typed arrays. Here is what node.js shows on a little-endian machine:
In the underlying ArrayBuffer, you can see the 1-byte chunks that comprise the JavaScript number, for a total of 8 bytes. We can then take each of these bytes, and convert them into a single character that represents it.
This gives us a string of 8 characters. If you are following along, you will see that the serialized string is extremely unreadable. In fact, chances are that there are control characters in the string. Hilarity may ensue if you have not disabled your terminal bell and attempt to print out your network messages (which may be arriving 30 times a second).
Caveats
I have been using “characters” and “bytes” almost interchangeably so far. However, there is a very clear difference and it pertains to character encoding. The key point here is that while JavaScript strings use UTF-16, WebSockets use UTF-8.1 UTF-8 is a variable-width encoding. Examination of the specification shows that code points less than 128 use a single byte, while code points from 128 to 2047 use two bytes. JSON will only produce ASCII characters, and will use one byte per character. However, when encoding arbitrary bytes, we also use characters in the 128 to 255 range, which take up two bytes.
Conclusion
When encoding the binary representations of the values into a string, the data from the transform given at the beginning of this post can be encoded in 82 bytes. We can also lose the JSON array structure, and simply encode a single byte up front with the number of players to expect in the rest of the string. This brings us down to roughly 1.78¢ per player per hour for 12 players.
In my next post, I’ll be covering alternate ways of serializing the data and discuss sending binary data over the WebSockets protocol.
- This is not actually true, as the hybi-07 draft of the WebSockets protocol introduced the option for binary data transfer, but I am going to ignore that fact in this post.