What is the correct way of calculating a large CRC32

Question

Here is an article that describes how to calculate CRC32 of maximum 1024 bytes using the built in CRC32 instruction found in modern x86-64 processors. However, I need to calculate CRC32 of more than 1024 bytes. Would it be a correct approach to calculate CRC32 of each block of 1024 bytes and in the end sum them, or is it incorrect? If so, what is the correct way to do it?

[A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS](http://www.ross.net/crc/download/crc_v3.txt) is a great explanation of how CRC works and is calculated. — Alexey Frunze, Apr 26 '12 at 13:44

score 5 · Answer 1 · edited Jan 10 '16 at 14:07

Quoting from the intel white paper that your article mentions,

Instead of computing CRC of the entire message with a traditional linear method, we use a faster method to split an arbitrary length buffer to a number of smaller fixed size segments, compute the CRC on these segments in parallel followed by a recombination step of computing the effective CRC using the partial CRCs of the segments.

Also,

The final recombination of CRCs adds an overhead and can be implemented with lookup tables on the Nehalem microarchitecture – we show how to do this with as few tables as possible while giving excellent overall performance on the range of sizes. The PCLMULQDQ instruction in the Westmere microarchitecture allows efficient recombination of CRCs without lookup tables. The various methods are thoroughly explained in this paper with real code examples.

So you need to study this paper in detail: Fast CRC Computation for iSCSI Polynomial Using CRC32 Instruction

Seems that you have read that article with great interest :)! — pythonic, Apr 26 '12 at 13:09

alk · Accepted Answer · 2012-04-26T15:22:03.170

4

No, just adding won't do the job.

The article you linked tells us how to do it:

The CRC output of one calculation is used as the initial CRC for the next calculation [...]

To cover the case of the final result being larger then 0xffffffff just do crc32 = ~crc32 & 0xffffffff after the final calculation.

edited Apr 26 '12 at 15:22

answered Apr 26 '12 at 13:09

alk

69,737
10
105
255

OK, so its a matter of passing the previous CRC to the next call. No problem with that! – pythonic Apr 26 '12 at 13:10
1

This is simpler than the technique Pavan describes, but of course if you do it this way then you can't parallelize the different chunks, they have to be processed sequentially. That said, I personally haven't ever felt a need to parallelize a checksum calculation, one core should be enough for anyone ;-) – Steve Jessop Apr 26 '12 at 13:12

What is the correct way of calculating a large CRC32

2 Answers2

Linked