This is the beauty of checksum algorithm: that the way you produce the checksum and the way you check it is somehow symmetric!
- About checksum
Checksum is typically used to verify the integrity of the data, especially over noisy/unrealiable communication channel. Thus, it is primarily used for error detection. That is, to know if the data received is correct.
This is quite different with, for example, error correction. Since its use is not only to check if there is error, but also to correct it.
Both error detection and error correction have overhead (or augemented) data. That is, data that is not part of the original data, but is added to the original data for the purpose of checking if the original data is changed (such as in the case or error detection) or to correct it (such as in the case or error correction).
Unlike in error detection algorithms, overhead data size of error correction algorithms tend to grow in proportional with the original data (since the more data you have, the more overhead you need to restore it).
We usually don't want overhead data size to be big, since big data means more resources we need to handle (i.e. time to process, time to transfer, etc). Thus, in that sense a good checksum algorithm is typically one that use the smallest amount of overhead data to detect the error but with great robustness (extremely rarely to produce false result).
And with that understanding the problem lies, since the robustness of the checksum, in reality, not only depends on the algorithm but also depends on the channel characteristics. Some channels may prone to certain errors while other channels to different ones.
In general, however, there are some checksums which are known to be more popular and more robust than the others. Of the latter, partly because it has inherent property to ensure that it will be robust in most, if nor all, of the channels (one of my favourites error detection algorithm is CRC - Cyclic Redundancy Check because it has that inherent property). But there is no perfect checksum for every scenario, it really depends on the use and scenario.
But still, you can measure robustness of a checksum algorithm. And there is a mathematical way to do it, which I think is beyond the scope of this discussion. Thus, some checksums, in those senses, can be said to be weaker than the others. Checksums which you showed in your question are also the weak ones, since they may produce false results more easily than strong ones such as, say, CRC.
- About the code
XOR with 0xFF for 8-bit is perfectly equivalent with binary-inverting a value, and it is not too hard to see it.
XOR with 0xFF
1110 0010
1111 1111
--------- XOR
0001 1101 //notice that this is exactly the same as binary inverting!
Thus, when you do XOR with 0xFF and ~checksum
, you get the same result -71 (and since your data type is char
, it has negative number). Then you increment it by 1, thus you get -70.
- About 2' Complement
Two's complement is a mathematical operation on binary numbers, as well as a binary signed number representation based on this operation. Its wide use in computing makes it the most important example of a radix complement. (wikipedia)
In other words, 2' complement is to find the negative representation of a value (in Computer Binary) and its method is, as you correctly did, by inverting all its bits and then add one to it. That's why you get -70 by 2' complementing 70. But this does not mean that 2' complement and XOR by 0xFF is the same, and as you can see by example, it is really not the same.
What XOR by 0xFF does in 8-bit data is simply equivalent to reversing all its bits. It doesn't add one to it.
- About the way to read add/read checksum
Since the checksum is used to know the integrity of the data (whether it is altered or not), people try to find best practice to do it. What you do is actually to get the checksum by 2' complement or by XOR with 0xFF.
And these are what they do:
- For 2' complement checksum.
Let say your message length is N. Since what you get by summation of N numbers are, let say, 70. Then by adding 2'complement checksum (that is -70), In the receiver side, you simply need to sum all the N+1 messages including the checksum and you should get 0 if the message is unaltered. This is how to use 2' complement checksum correctly.
- For XOR with 0xFF
Again, with the same example as the previous, you should get -1 if you sum up all the N+1 messages, including the checksum. And since the hex representation of -1 is 0xFF in 8-bit signed, thus by XOR ing the result (-1) with 0xFF, you should get 0xFF ^ 0xFF = 0 if the message contains no error
Hence in both cases, you simply need to check if the message contains error or not by checking if the final result is 0 (no error) or not!! And this is typically true for checksum algorithms!
This is the beauty of checksum algorithm: that the way you produce the checksum and the way you check is it somehow symmetric!