3

If I have substrings S0, S1, ... Sn with calculated CRCs C0, C1, ... Cn, am I able to determine the CRC C0...n of concatenated input S0S1...Sn with any substantially greater efficiency than linearly processing the whole string?

Obviously, C0...n = CRC(S1...n, initialized with C0), but I'd like to know whether C0...n = f(C0,C1,...Cn) for some f() with O(n) complexity instead of O(|S0S1...Sn|).

Jeff
  • 3,475
  • 4
  • 26
  • 35
  • @Oli Charlesworth Thanks, sellibitze's proposed approach looks quite viable! The mathematics of it seem obvious in retrospect, but the trick with the trailing zero management seems inspired to me. – Jeff Apr 13 '14 at 21:30

1 Answers1

5

Yes. You can see how in the implementation of crc32_combine() in zlib. It takes crc1, crc2, and len2, where len2 is the number of bytes in the block on which crc2 was calculated. It takes O(log(len2)) time. The combination can be repeated for following blocks.

The approach is to continue the CRC calculation on crc1, following with len2 zero bytes, and then exclusive-or crc2. The len2 bytes are applied with a zero operator that is repeatedly squared and applied for each 1 bit in len2, which permits the O(log(len2)) execution time. The routine was added to zlib in 2004.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • 1
    Just what I was looking for, and a credible authority at that. Thanks! – Jeff Apr 14 '14 at 01:46
  • I just benchmarked crc32_combine() and wanted to know if any size/time tradeoff optimizations are even theoretically possible for the benefit of smaller inputs. (crossover point vs. just doing the regular calculation was ~= 16b random lengths/8 length bits set) I was considering the possibility of rapidly assembling gzip streams from concatenated compressed components and was hoping that it would be viable to combine components closer to 1kiB in length. – Jeff Apr 14 '14 at 03:33
  • 1
    Yes. `crc32_combine()` generates the same operators every time it is called. You can instead generate those operators once and save them. You can do even better if you are always using the same size blocks, say length _n_. You can create a single operator for _n_ zeros and apply that in a call, which would be very fast. I have an example of that in [this answer](http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software/17646775#17646775). – Mark Adler Apr 14 '14 at 03:37
  • Wow, that's great. I just cracked open crc32.c to start looking around a bit, and it looks pretty doable, especially with the generous commenting in there. Your guidance has been very appreciated! – Jeff Apr 14 '14 at 03:44
  • 2
    I got around 100x speedup after rewriting `crc32_combine()` with pre-computed poly^(2^n) matrices as you suggested, and I do see further gain the more I constrain lengths in repeated calls. For lengths in [8, 2048]B (%8==0 -> avg 4b/8b set), I get about 200-250 cycles per M*v call with gcc 4.8/x86_64 and warm caches, so I think there may be further room for hand-tuning if I really want to get crazy. – Jeff Apr 14 '14 at 17:44