Doing network file copy, should I validate data

Question

In a c# Application on Windows, I'm calculating corrections for a machine and put them in a plain-text file. After calculating those corrections, I send them to the machine (on Windows too) using a simple File.Copy over network.

If files are corrupted when the machine read them, some really bad things could happen.

According to this context, should I validate transmitted files (using a checksum or something else)? Or does the protocol (is it TCP?) already does it?

Some insights on the TCP checksum weakness: http://criticalindirection.com/2016/02/22/tcp-checksum-the-fault-in-the-stars/ — user31986, Feb 23 '16 at 22:30

score 2 · Answer 1 · edited Jan 20 '17 at 09:51

If your application is sensitive to file being corrupt then yes, you should validate....Validate the file by using hashing algorithm..

Sample code on how you can create hash and validate

string data = Flie.ReaddAllText(); 
SHA1 sha1 = SHA1.Create();
byte[] hashData = sha1.ComputeHash(Encoding.Default.GetBytes(data));

Validation

// create the hash of transffered file and compare it with stored hash
if (string.Compare(InputDataHash, storedHashData) == 0)
{
    return true;
}
else
{
    return false;
}

score 2 · Accepted Answer · answered Feb 22 '16 at 13:38

There are many steps that data passes through in the workflow that you outlined (disk, RAM, TCP). Corruption can occur in all of those places and none of them have strong checksums built in. TCP checksums are weak. ECC RAM does not provide absolute safety.

Corruption will be very rare but it will happen sooner or later. You probably need to build end-to-end checksumming if this is really important to you.

score 2 · Answer 3 · edited May 23 '17 at 12:33

Original answer

TCP is reliable and has error correction, so what you transmit over TCP will be what you receive at the other end (this includes whatever checksum you transmit alongside your file). What might be better is to figure out why bad files crash your program, and figure out how to check the format so you can avoid this.

Amended answer

TCP does have error correction, but it's considered weak (it's a 16-bit checksum for each packet, plus another 16-bit checksum per segment.) Another answer suggests that over random data, if a bit gets flipped then the TCP checksum will incorrectly match the data in 1 out of 2^16 cases. Fortunately, the actual data rate is probably lower because in addition to TCP checksums, your Ethernet and Wifi also computes a CRC error check code. Stone/Partridge in that link (Section 4.4) estimate a range of undetected error rates in a couple of different network environments, and they range from about 1 x 10^-10 to about 6.13 x 10^-8. Choosing one of their high estimates over a local area network at about 8.8 x 10^-9, and using Wireshark's sample capture of an SMB session to estimate about 3 TCP packets per 4000 bytes written, and assuming about 4 gigabytes are written in the request, we can model it as a binomial distribution (then approximated by a normal distribution), we can estimate about a 1 x 10^-20 chance that there's at least one bad undetected packet in the transfer that could corrupt your input file.

... however, if your network is noisy or unreliable that undetected error rate could be many orders of magnitude higher and a value derived from a well distributed cryptographic checksum could be beneficial.

This is what I love about SO: trying to figure out the correct answer has taught me several things I didn't know about TCP, Ethernet and SMB/CIFS! I've amended my answer. — struct, Feb 22 '16 at 15:34
There are performance hits for cryptographic checksums. Some insight in this presentation: http://www.snia.org/sites/default/files/SDC15_presentations/etc/TejasWanjari_Integrity_In-Memory_Data.pdf — user31986, Feb 23 '16 at 22:32

Doing network file copy, should I validate data

3 Answers3