0

// -- Edited

Currently, hardware functions (__builtin_ia32_crc32qi and __builtin_ia32_crc32di) are used for crc32 with __builtin_ia32_crc32di returning 64 bits. Then, 64-bits are trimmed to 32-bits. Existing data is based on this logic. https://gcc.gnu.org/onlinedocs/gcc-4.9.2/gcc/X86-Built-in-Functions.html

uint32_t calculateCrc32(uint32_t init, const uint8_t* buf, size_t size) {
  uint32_t crc32 = init;
  const uint8_t* pos = buf;
  const uint8_t* end = buf + size;

  // byte-wise crc
  while (((uint64_t)pos) % sizeof(uint64_t) && pos < end) {
    crc32 = __builtin_ia32_crc32qi(crc32, *pos);
    ++pos;
  }

  // 8-bytes-wise
  while (((uint64_t)pos) <
         (((uint64_t)end) / sizeof(uint64_t)) * sizeof(uint64_t)) {
    crc32 = __builtin_ia32_crc32di(crc32, *(uint64_t*)pos);
    pos += sizeof(uint64_t);
  }

  // byte-wise crc for remaining
  while (pos < end) {
    crc32 = __builtin_ia32_crc32qi(crc32, *pos);
    ++pos;
  }

  return crc32;
}

I am trying to implement a lookup-table version. What I am doing is: 1) first generate a lookup table 2) do table lookup

uint8_t kCrc32tab[256];
for (int i=0; i < 256; ++i) {
  uint8_t buf = i;
  kCrc32tab[i] = calculateCrc32(0xFF, &buf, 1);
} 

uint32_t crc32WithLookup(uint32_t crc32_init, const uint8_t* buf, size_t size) {
   uint32_t crc32 = crc32_init;
   for (std::size_t i = 0; i < size; i++) {
        uint8_t key = (crc32 ^ buf[i]) & 0xFF;
        crc32 = kCrc32tab[key] ^ (crc32 >> 8);
    }
    return crc32;
}

However, crc32 outcome is different between crc32WithLookup and calculateCrc32. Any suggestions?

lookup example in redis: https://github.com/redis/redis/blob/unstable/src/crc16.c

SuperBald
  • 109
  • 1
  • 12
  • 3
    what is "computation via hardware"? Please give more details and a [mcve] – JHBonarius Apr 20 '21 at 06:57
  • 2
    It may be a non-reflected CRC-32. There may a final exclusive-or needed. There's no way to tell unless you first write a decent question. What is the test case, what are you getting, and what were you expecting? What is in the table you generated? Please compute and provide the CRC of the nine bytes which are the characters 123456789. That might give us a start on which of the many CRC-32's your hardware implements. – Mark Adler Apr 20 '21 at 14:14
  • @MarkAdler: Thanks for your reply. I am new to the crc computation. I used the logic above. Can you help give a look again? – SuperBald Apr 20 '21 at 16:53

1 Answers1

3

That CRC-32 is commonly referred to as the CRC-32C (where outside the provided code the initial value and final exclusive-or is 0xffffffff).

There are two errors in your code. The table must be 32-bit values, and the initial value for your CRCs is zero. So you need uint32_t kCrc32tab[256]; and kCrc32tab[i] = calculateCrc32(0, &buf, 1);.

This answer provides more advanced and faster code for both the hardware and software versions of that CRC calculation.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Thanks a lot for the great feedback! One more question: `__builtin_ia32_crc32di()` returns int64_t, and we truncated it to int32_t. Does this make a difference between table lookup and `calculateCrc32()`? – SuperBald Apr 20 '21 at 22:56
  • Made the changes as suggested. Now `crc32WithLookup()` and `calculateCrc32()` return the same crc code. Thanks! However, the cpu of `crc32WithLookup()` is almost doubled. Seems hardware method is more efficient than table lookup? – SuperBald Apr 21 '21 at 06:36
  • ?? Of course the hardware instructions are faster. Otherwise there would be no point in having them. And only a factor of two? The hardware instruction version is almost 20 times faster than the byte-wise table lookup. – Mark Adler Apr 21 '21 at 13:36
  • 1
    If you use my code in the answer that I linked, that software implementation is four times as fast as the byte-wise table able, and the use of the hardware instructions is twice as fast as the simple approach above. The result is that the hardware instructions are more than ten times as fast as the software implementation. – Mark Adler Apr 21 '21 at 13:40