Can a TCP checksum fail to detect an error? If yes, how is this dealt with?

Question

If a TCP payload gets corrupted in transit the recomputed checksum won't match the transmitted checksum. Great, all fine so far.

If a TCP checksum gets corrupted in transit the recomputed checksum won't match the now corrupted checksum. Great, all fine so far.

What happens when both the payload and checksum get corrupted and the recomputed checksum, whilst different to what it should be, just happens to match the now corrupted checksum?

I can see with a good checksum algorithm (and additional checksums at lower levels) this might be very, very unlikely but isn't TCP meant to be 100% reliable? How does it resolve these false positives?

Mecki · Answer 1 · 2013-03-27T00:34:23.443

Something that should be noted here, and that most people overlook completely, is the fact, that the TCP checksum is actually a very poor checksum.

The TCP checksum is a 16-bit ones-complement sum of the data. This sum will catch any burst error of 15 bits or less, and all 16-bit burst errors except for those which replace one 1’s complement zero with another (i.e., 16 adjacent 1 bits replaced by 16 zero bits, or vice-versa). Over uniformly distributed data, it is expected to detect other types of errors at a rate proportional to 1 in 2^16. The checksum also has a major limitation: the sum of a set of 16-bit values is the same, regardless of the order in which the values appear.

Source: ftp://ftp.cis.upenn.edu/pub/mbgreen/papers/ton98.pdf

So if you randomly flip any number bits anywhere in the data part of the packet, the chances are 1 to 65536 that this error is not detected, even if you don't touch the checksum at all, as the new data, even though totally corrupt, has in fact the same checksum as the old one. If you just swap two 16 bit values in the data part, regardless which ones and regardless how often, the chances are even 100% that this error is not detected, since the order in which the 16 bit values appear in the data part of the packet is totally irrelevant to the value of the calculated checksum.

What I'm trying to say here is that you don't have to worry too much about the rather unlikely case that data and checksum both get corrupted and this error is not detected because the corrupted checksum matches the corrupted data, the truth is that every day millions of TCP packets on the Internet have only the data corrupted and this error is not detected because the uncorrupted checksum still matches the corrupted data.

If you need to transfer data and you want to be sure the data didn't get corrupted, the TCP checksum alone is certainly not enough for this task. I would even dare to say that a CRC checksum is not enough for this task, since a CRC32 may not detect an error where more than 32 bits in a row are affected (these errors can "cancel out" each other). The minimum checksum you'd need for ensuring flawless data transfer is the MD5 value of the data. Of course anything better than that (SHA-1, SHA-256, SHA-384, SHA-512, Whirlpool, and so on) will work even better, yet MD5 is sufficient. MD5 may not be secure enough for cryptographic security any longer (since it has been broken multiple times in the past), but as a data checksum MD5 is still absolutely sufficient.

@fumoboy007 Because you are not dealing with cryptography. The chances that broken data has the same MD5 checksum as correct data is one to 340,282,366,920,938,463,463,374,607,431,768,211,456 (39 digits!), the universe probably has less atoms than that. You can easily generate two sets of data with the same MD5 checksum (that's why you **must not** use MD5 for cryptography anymore), but these two data sets will look completely different (not even close to similar!). Data modified by transmission error will still look very similar to the correct data. — Mecki, Jun 16 '15 at 13:32
The paper cited is a very good read. What is really interesting in the article is their analysis of the source of packet errors and how they are dealt with in the real world. However, they leave out the source of CRC false positive derivation. This paper does derive CRC false positive probabilities: https://ris.utwente.nl/ws/portalfiles/portal/5382287. See equation (2) and see also Figure 2 which is for the CCITT CRC-16 - but similar results can be calculated for CRC-32. Bottom line: False CRC positives are much more likely than 1/2^N. — natersoz, Oct 31 '17 at 18:47
But aren't there actually two 16-bit checksums and one 32-bit CRC in a TCP/IP packet on Ethernet? The ethernet header has a CRC32. The IP header has a 16-bit checksum. The TCP header also has a 16-bit checksum. That's 64 bits of CRC/Checksum data protecting the packet, right? — rcronk, Apr 04 '23 at 06:29
@rcronk The CRC checksum of the Ethernet packet only protects data between two Ethernet endpoints but every router is an Ethernet endpoint and an IP packet may cross many routers on its way to its final destination (6 to 20 are not unusual when sending data over Internet). Also IP is used over other links than Ethernet (ATM, PPP, L2, etc.) The IP checksum does not protect the data at all, only the IP header itself and only exists for IPv4, not IPv6. Only the TCP checksum protects the actual data all the way from the initial sender to the final recipient over all link types and routers. — Mecki, Apr 04 '23 at 17:45
@Mecki - thanks for the additional nuance to what's covered by the various checksums and CRC. — rcronk, Apr 05 '23 at 04:26

score 16 · Answer 2 · answered Sep 30 '10 at 11:55

16

Can a TCP checksum produce a false positive?

Yes. The checksum is considerably smaller than the packet, so many different packets can match a given checksum.

If yes, how is this dealt with?

In TCP, not at all. However, most data corruptions will be noticeable at a higher level, e.g. your XML is no longer well-formed; your email is no longer English, etc.

answered Sep 30 '10 at 11:55

Bryan

11,398
3
53
78

19

in addition to being informative, i laughed really hard at "your email is no longer English" – rajb245 Apr 19 '15 at 20:44
May a flipping bit in the data lead to an email to flip its language ? – Pierre Maoui Jan 26 '16 at 13:26
Flipping one bit would result in the checksum failing; more bit-flips are required to observe the problem that the original question is about. But I meant the words would be completely mangled rather than changed from English to French, say. The probability of that happening and still passing the checksum is very, very low :-) – Bryan Jan 26 '16 at 16:46

score 15 · Answer 3 · answered Sep 30 '10 at 11:55

15

No it can't be 100% reliable: this paper mentions 1 in 16 million to 10 billion packets not caught by the error control system. I'll let you calculate the occurences per day/week :)

answered Sep 30 '10 at 11:55

samy

14,832
2
54
82

World traffic is about 10 to the 15 packets per day: so the chance of its happening to some of other people's packets (though not to yours) is pretty high, therefore. – ChrisW Sep 30 '10 at 12:03
I think the only metric as to how probable it is for one person to experience it is packet-related; you're as liable to have this problem as somebody else that uses the same amount of packets. OTOH, to be noticeable you'd have to experience the failure in a critical packet; if your html or your js is mangled you frown and just reload the page, you don't whip out the post-mortem tools :) By the way, where did you find your stat? I looked around but couldn't find data about the number of packets... – samy Sep 30 '10 at 12:16
1

I was just pointing out that if the error rate is one in 10 billion then it probably won't happen to you (because you don't send that many packets): but it probably will happen to someone else (because they do, collectively, send more than many packets). The first stats I found were [World 7,500-12,000 PB (PetaByte = 10^15 bytes)](http://www.dtc.umn.edu/mints/), which I divided by my guesstimate of 500 bytes per packet. – ChrisW Sep 30 '10 at 12:22
@ChrisW Thanks for the link :) You're more conservative than i would have been regarding the packet size, though, but even by taking the max size of the packet, errors happen daily – samy Sep 30 '10 at 12:22
1

Actually your number includes checks on lower levels in combination with TCP (e.g. Ethernet CRC). The TCP checksum alone has a probability of 1 in 65536 errors not being detected. That is very high. Considering that there are trillion of packets every day, the error rate of TCP all alone would still cause millions of corrupted packets a day that are not detected as the corrupted data still has the same checksum as the original one. – Mecki Mar 27 '13 at 00:31

score 7 · Answer 4 · answered Sep 30 '10 at 11:53

7

and additional checksums at lower levels

Some of these are stricter than checksums, e.g. Ethernet uses a CRC instead of a checksum.

this might be very, very unlikely but isn't TCP meant to be 100% reliable? How does it resolve these false positives?

I don't think it can. Even if it sent a duplicate via hard copy and carrier pigeon, a cosmic ray or quantum effects might theoreticaly mangle the duplicate too in exactly the same way. It's just very, very unlikely.

You can also implement arbitrarily strong integrity chcking at the application layer (above TCP), e.g. using cryptographic signing.

answered Sep 30 '10 at 11:53

ChrisW

54,973
13
116
224

Whilst Ethernet has good integrity checking what about other forms of network? – Mr Question McQuestion Sep 30 '10 at 12:01
2

I expect you'd want to engineer the integrity checking to match the error rate of your data link. For example [PPP](http://en.wikipedia.org/wiki/Point-to-Point_Protocol) uses a CRC as well. – ChrisW Sep 30 '10 at 12:06
That makes sense. For a long connection over many different data link types (ethernet, ppp, atm) I guess you'll be at the mercy of the worse component link (which might not have integrity checking at all). – Mr Question McQuestion Sep 30 '10 at 12:15
Good point, and the _use_ of your Data Link matters too. If it's critical, go for another layer of protection; you'll only delay the inevitable, though. If it's just transmitting throw-away data, why bother since people won't notice or refresh their query... – samy Sep 30 '10 at 12:18
7

"_Ethernet uses a CRC instead of a checksum._" how is a CRC not a "checksum"? – curiousguy Dec 08 '11 at 20:02
2

@curiousguy To me the word "[checksum](https://en.wikipedia.org/wiki/Checksum#Parity_byte_or_parity_word)" implies (because the term includes) a simple XOR, a parity bit on a byte of bits, or a parity byte on a packets of bytes -- whereas CRC is a more sophisticated algorithm. – ChrisW Mar 30 '17 at 13:52
@ChrisW Is there any packet generators that can generate only these kind of traffics or be able to create them? – Arash Aug 15 '23 at 16:45
1

@Arash I don't know about tools, my experience with this level of the protocol was in the early 1990s. – ChrisW Aug 15 '23 at 19:05

score 3 · Answer 5 · answered Aug 10 '12 at 14:02

Assume

packet payload: 1000 byte

packet checksum: 2 byte

probability of packet with double error, one of wchich in checksum (assume P very small, less than 1/10^5):

A = 8P*(1000*8P) = 6*10^4 * P^2

probability of exact checksum:

B = 1/2^16 = 6/10^4

probability of false positive:

A * B = 40 * P^2

The probability is low (P=1/10^6, then the probability of false positive A*B=4/10^11) but in any case with any crc algorithm it can't be zero. The probability of a random 1000 byte packet to appear as another random 1000 byte packet is P^8000, as if all bytes contain errors.

If P is high, for example from 1/10^3 to 1, the calculations above does not apply. In that case A=1 (all packets contain double errors) and the probability of false positive is just A*B = 6/10^4. It's not a very relevant case because more than 99% of received packets will contain errors in crc.

score -3 · Answer 6 · answered Sep 30 '10 at 11:54

-3

I would imagine the probability is one in a billion million zillion kajillion, because if the TCP data is corrupted, which is the transport layer, it will also mean the other layers (datalink and network) will also be corrupted. I believe at least the datalink layer has a checksum for integrity, so you'd have to have both checksums fail.

Corrupting in such a way that at least two separate checksums fail, is astronomically unlikely, maybe even impossible.

answered Sep 30 '10 at 11:54

NibblyPig

51,118
72
200
356

1

Not all datalink layers have integrity checking though do they? – Mr Question McQuestion Sep 30 '10 at 12:00
No, they don't. The paper i linked to above mentions the use of application-level checks in some cases – samy Sep 30 '10 at 12:02
1

See http://academic.research.microsoft.com/Paper/22436.aspx , lower level crc might not be as reliable as you think. – nos Sep 30 '10 at 12:13
RAM can introduce errors too. Not all problems occur in the wire (or the ether). – curiousguy Dec 08 '11 at 20:03

Can a TCP checksum fail to detect an error? If yes, how is this dealt with?

6 Answers6

Linked