0

I'm building a REST based web service which will serve a couple hundreds clients that will upload/request little bursts of information throughout the day and make one larger cache update (about 100-200kb) once a day.

While testing the large update on the production machine (a linux virtual machine in the cloud running Apache/PHP) I discovered to my utter dismay that data gets to client corrupted (i.e. with one or more wrong character) literally MOST of the times.

Example of corrupted JSON, parser says SyntaxError: JSON.parse: expected ':' after property name in object at line 1 column 81998 of the JSON data:

"nascita":"1940-12-17","attiva":true,","cognome":"MILANI"

should be

"nascita":"1940-12-17","attiva":"true","cognome":"MILANI"

This is the HTTP Header of the answer

Connection  Keep-Alive
Content-Type    application/json
Date    Fri, 02 Jun 2017 16:59:39 GMT
Keep-Alive  timeout=5, max=100
Server  Apache/2.4.18 (Ubuntu)
Transfer-Encoding   chunked

I am certainly not an expert when it comes to networking but I used to think that such occurrences, failures of both IP and TCP error detection, were extremely rare. (I found this post interesting: Can a TCP checksum produce a false positive? If yes, how is this dealt with?)

So... what's going here? Am I missing something?

I started to think of possible solutions.

The quickest I could think of was using HTTP compression: if the client is unable to decompress the content (which is very likely in case of data corruption) then I can ask for the content again. I enabled that on Apache and, to my surprise, all responses completed with valid data. Could it be that web browsers (I'm using good old Firefox for testing the web service) have some built-in mechanism for re-requesting corrupt compressed data? Or MAYBE the smaller, less regular nature of compressed data makes TCP/IP mistakes less likely??

The other quick solution that came to my mind was to calculate a checksum of the content, something I could do for smaller requests that don't really benefit from compression. I am trying to figure out if and how the Content-MD5 field in HTTP could help me... Web browser seems to ignore it, so I guess i will have to compute and compare it explicitely on my client...

Using TLS may be another good idea, possibly the best.

Or again.... am I missing something HUGE? Like, I don't know, for some reason my Apache is using UDP??

  • Can you provide examples of how it's corrupted? – Fletchius Jun 02 '17 at 16:54
  • What are you uploading and how? How are you attempting to decode the upload? What makes you say the data is corrupt? – Matt Clark Jun 02 '17 at 16:55
  • it's JSON data, firefox tries to parse it and tells me which columns contains an error – Henry Chinaski Jun 02 '17 at 16:57
  • These are corrupt JSON responses, not corrupt HTTP. – user207421 Jun 02 '17 at 17:24
  • @EJP I would drop JSON from the title, it's misleading, it could be xml, english or anything for what it matters – Henry Chinaski Jun 02 '17 at 17:38
  • @Henry I wouldn't. Corrupt JSON can only be caused by incorrect JSON-producung code. – user207421 Jun 04 '17 at 00:32
  • @EJP Nope, that's what I'm trying to say, please read my full question again. Characters misplacements are erratic, they change at each refresh whereas the data and code behind are staying the same. If JSON was incorrect than can you explain why using http compression suddenly makes it work every time? – Henry Chinaski Jun 04 '17 at 07:39

1 Answers1

3

All these errors didn't make any sense.

So I got Wireshark to capture all TCP segments incoming from the web server and see what could be wrong with them. Again, Firefox showed a mistake at a random column but.... it turned out that there was no such error in the corresponding TCP segment!

I then tried Chrome (which doesn't come with a built in parser), installed JSONView extension and everything there was fine! Did the the same with Firefox, installed JSONView, and.. no errors!

Turns out there's some kind of bug with the latest Firefox built-in JSON viewer. I'm running 53.0.3 right now.