_mm_crc32_u8 gives different result than reference code

Question

I've been struggling with the intrinsics. In particular I don't get the same results using the standard CRC calculation and the supposedly equivalent intel intrinsics. I'd like to move to using _mm_crc32_u16, and _mm_crc32_u32 but if I can't get the 8 bit operation to work there's no point.

static UINT32               g_ui32CRC32Table[256] =
{
    0x00000000L, 0x77073096L, 0xEE0E612CL, 0x990951BAL,
    0x076DC419L, 0x706AF48FL, 0xE963A535L, 0x9E6495A3L,
    0x0EDB8832L, 0x79DCB8A4L, 0xE0D5E91EL, 0x97D2D988L,
....

// Your basic 32-bit CRC calculator
// NOTE: this code cannot be changed
UINT32 CalcCRC32(unsigned char *pucBuff, int iLen)
{
    UINT32 crc = 0xFFFFFFFF;

    for (int x = 0; x < iLen; x++)
    {
        crc = g_ui32CRC32Table[(crc ^ *pucBuff++) & 0xFFL] ^ (crc >> 8);
    }

    return crc ^ 0xFFFFFFFF;
}


UINT32 CalcCRC32_Intrinsic(unsigned char *pucBuff, int iLen)
{
    UINT32 crc = 0xFFFFFFFF;

    for (int x = 0; x < iLen; x++)
    {
        crc = _mm_crc32_u8(crc, *pucBuff++);
    }
    return crc ^ 0xFFFFFFFF;
}

Can you give an example of how you're testing these two functions and what the output is? — Mr. Llama, Mar 20 '15 at 19:30

score 5 · Answer 1 · answered Mar 20 '15 at 19:33

That table is for a different CRC polynomial than the one used by the Intel instruction. The table is for the Ethernet/ZIP/etc. CRC, often referred to as CRC-32. The Intel instruction uses the iSCSI (Castagnoli) polynomial, for the CRC often referred to as CRC-32C.

This short example code can calculate either, by uncommenting the desired polynomial:

#include <stddef.h>
#include <stdint.h>

/* CRC-32 (Ethernet, ZIP, etc.) polynomial in reversed bit order. */
#define POLY 0xedb88320

/* CRC-32C (iSCSI) polynomial in reversed bit order. */
/* #define POLY 0x82f63b78 */

/* Compute CRC of buf[0..len-1] with initial CRC crc.  This permits the
   computation of a CRC by feeding this routine a chunk of the input data at a
   time.  The value of crc for the first chunk should be zero. */
uint32_t crc32c(uint32_t crc, const unsigned char *buf, size_t len)
{
    int k;

    crc = ~crc;
    while (len--) {
        crc ^= *buf++;
        for (k = 0; k < 8; k++)
            crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    }
    return ~crc;
}

You can use this code to generate a replacement table for your code by simply computing the CRC-32C of each of the one-byte messages 0, 1, 2, ..., 255.

These reference implementations help to understand the question [*CRC32 hash collision on the same string for any seed*](https://stackoverflow.com/q/64081720/2932052). — Wolf, Aug 25 '21 at 08:48

score 0 · Answer 2 · edited Oct 07 '21 at 13:31

0

FWIW, I've obtained SW code that demonstrably matches the Intel crc32c instruction, but it uses a different polynomial: 0x82f63b78 The function definitely doesn't match any of the iSCSI test examples here: https://www.rfc-editor.org/rfc/rfc3720#appendix-B.4

What's frustrating in all this is every implementation I've tried for CRC-32C comes out with different hashes from all the others. Is there a true piece of reference code out there?

edited Oct 07 '21 at 13:31

Community

1
1

answered Oct 02 '19 at 18:20

user2465201

471
5
10

It does. The page you referenced presents CRC32 in the order as it appears in memory, that is BIG ENDIAN . `crc32c` instructions of Intel return you LITTLE ENDIAN. It helps a lot, when you present actual and expected output as HEX. You then can clearly see byte swaps and inversions. – Pawel Kraszewski Aug 02 '22 at 07:24

_mm_crc32_u8 gives different result than reference code

2 Answers2

Linked