What would be the best compression algorithm to use to compress packets before sending them over the wire? The packets are encoded using JSON. Would LZW be a good one for this or is there something better?
-
I don't know the algorithm but maybe this project is similar to you https://github.com/rgcl/jsonpack – Aditya Kresna Permana Jan 29 '19 at 08:05
7 Answers
I think two questions will affect your answer:
1) How well can you predict the composition of the data without knowing what will happen on any particular run of the program? For instance, if your packets look like this:
{
"vector": {
"latitude": 16,
"longitude": 18,
"altitude": 20
},
"vector": {
"latitude": -8,
"longitude": 13,
"altitude": -5
},
[... et cetera ...]
}
-- then you would probably get your best compression by creating a hard-coded dictionary of the text strings that keep showing up in your data and replace each occurrence of one of the text strings with the appropriate dictionary index. (Actually, if your data was this regular, you'd probably want to send just the values over the wire and simply write a function into the client to construct a JSON object from the values if a JSON object is needed.)
If you cannot predict which headers will be used, you may need to use LZW, or LZ77, or another method which looks at the data which has already gone through to find the data it can express in an especially compact form. However...
2) Do the packets need to be compressed separately from each other? If so then LZW is definitely not the method you want; it will not have time to build its dictionary up to a size that will give substantial compression results by the end of a single packet. The only chance of getting really substantial compression in this scenario, IMHO, is to use a hard-coded dictionary.
(Addendum to all of the above: as Michael Kohne points out, sending JSON means you're probably sending all text, which means that you're underusing bandwidth that has the capability of sending a much wider range of characters than you're using. However, the problem of how to pack characters that fall into the range 0-127 into containers that hold values 0-255 is fairly simple and I think can be left as "an exercise for the reader", as they say.)

- 1,343
- 1
- 11
- 26
-
For the example above, storing in SOA instead of AOS format reduces the data a lot. I found that for a lot of cases that is a good 'compression' method but it depends on the specific application if SOA is suitable. – Chris May 07 '09 at 06:46
-
The advice in 2 is confusing - Is LZW still not the right method even if each packet is large? What if the packets do not need to be compressed separately? Also any details or link about this packing 0-255 into 0-127 would be helpful – Dmitri Zaitsev Aug 28 '13 at 17:09
-
It's packing 0-127 into 0-255, not the other way around. ASCII uses 8-bit bytes, but the characters of standard text only use the lower 7 bits; the other 128 characters, which set the highest bit to 1, are control characters. To pack the characters, you take every eighth character and divide up its seven data bits among the unused 'high bits' of the previous 7 characters. – afeldspar Jul 19 '14 at 13:26
There are two more JSON compression algorithms: CJson & HPack The HPack does a very good job, comparable to gzip compression.

- 3,893
- 2
- 27
- 36
Here is a short test on the compressibility of JSON data original: crime-data_geojson.json 72844By (You can get the file here: https://github.com/lsauer/Data-Hub . The file was picked at random but cannot be representative of average JSON data)
except for zip all archiver parameters were set to ultra
* cm/ nanozip:
> 4076/72844
[1] 0.05595519
* gzip:
> 6611/72844
[1] 0.09075559
* LZMA / 7zip
> 5864/72844
[1] 0.0805008
* Huffman / zip:
> 7382/72844
[1] 0.1013398
* ?/Arc:
> 4739/72844
[1] 0.06505683
This means that compression is very high and beneficial. JSON data generally has a high entropy. According to wikipedia
The entropy rate of English text is between 1.0 and 1.5 bits per letter,[1] or as low as 0.6 to 1.3 bits per letter, according to estimates by Shannon based on human experiments
The entropy of JSON data is often well above that. (In an experiment with 10 arbitrary JSON files of roughly equal size i calculated 2.36)

- 23,698
- 16
- 85
- 87
-
I'm not the only one who knows about NanoZip! +1 :D (but I'd never use it on the wire lol) – Camilo Martin Jul 27 '12 at 03:39
Ummm...Correct me if I'm wrong, but if you are implementing on-the-wire compression, then you control both ends of the connection, right? In that case, if JSON is too fat a protocol, why wouldn't you just choose a different wire protocol that isn't as fat? I mean, I understand the appeal of using a standard like JSON, but if you are concerned about bandwidth, then you probably ought to pick a wire protocol that isn't all text.

- 11,888
- 3
- 47
- 79
-
4" then you probably ought to pick a wire protocol that isn't all text" for example? (+1 if you name two or more ;-) – tobsen Jan 18 '10 at 10:48
-
@tobsen [TCP](http://tools.ietf.org/html/rfc793), [IP](http://tools.ietf.org/html/rfc791), [UDP](http://tools.ietf.org/html/rfc768)? But still, the whole web uses HTTP since ages and never had a problem ([SPDY](http://www.chromium.org/spdy/spdy-whitepaper/) is on the works). – Camilo Martin Jul 27 '12 at 03:29
-
Also, regarding text vs. binary... compare the Windows registry with the Linux approach of all-text and tell me which is faster... Text doesn't mean slow. – Camilo Martin Jul 27 '12 at 03:31
-
@CamiloMartin I think "on the wire protocol" and "wire format" are mixed up here. I was searching for wire format alternatives like ASN1, XDR, ProtocolBuffers (others?) instead of "wire protocols". JSON is not a protocol. – tobsen Jul 27 '12 at 12:02
-
@tobsen Ah, yes. But SOAP can be called a protocol, correct? So a JSON version would. I guess that's what the answerer meant by "protocol" too. – Camilo Martin Jul 27 '12 at 12:11
-
@Camilo Martin JSON is a format, TCP, UDP and IP are protocols. Of course, HTTP runs over TCP and IP, meaning that you can send JSON using these protocols, or you can choose a more efficient (in terms of data size) binary format, which is presumably what you were getting at. – pjcard Jul 27 '17 at 16:00
Let the webserver compress and the browser decompress natively; gzip or deflate.

- 82,161
- 34
- 89
- 109
I have found that the compression algorithm tends to be more effective than choosing an alternative format. If this is a 'real-time' compression, I would recommend investigating a lower-level Brotli or Zstandard compressor (the high level ones take a lot CPU - but do give very good compression).
If you want to read about all the alternatives and how I came to that conclusion, the full details can be found on the Lucidchart techblog.

- 1,641
- 1
- 15
- 11
Gzip (deflate algorithm) is pretty good at compression, although like all good compression algorithms, uses plenty of cpu (3-5x as much as overhead of json reading/writing on my testing).

- 113,358
- 34
- 211
- 239