2

before I start, a disclaimer: while I can go around C/C++ code, I am no wizard, nor did I ever did enough programming to call myself a capable programmer with it.

I'm trying to use CRC32C to validate data that is coming into our servers from browser. Currently both implementations use the same code (nodeJS on server), but we would like to switch to hardware implementation(blog post, github repo) (when available) and for that I need a correctly functioning version in the browser.

I tried to go with this implementation (and another, internally develop, but also not-working) but using correct polynom (0x82F63B78 instead of 0xEDB88320, and also 0x1EDC6F41 & 0x8F6E37A0) but no polynom that I used produces the correct output.

Continuing in my research I find a post from Mark Adler which includes a software implementation and decide to try to convert it to Javascript (to the best of my understanding of C).

The result:

function crc32c_table_intel() {
var POLY = 0x82f63b78;
var n, crc, k;
var crc32c_table = gen2darr(8, 256, 0);

for (n = 0; n < 256; n++) {
    crc = n;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
    crc32c_table[0][n] = crc;
}
for (n = 0; n < 256; n++) {
    crc = crc32c_table[0][n];
    for (k = 1; k < 8; k++) {
        crc = crc32c_table[0][crc & 0xff] ^ (crc >> 8);
        crc32c_table[k][n] = crc;
      _crc_tmptable.push(crc32c_table[k][n]);
    }
}
return crc32c_table;
}


function crc32c_sw(crci, str) {
var len = str.length;
var crc;
var crc32c_table = crc32c_table_intel();

crc = crci ^ 0xffffffff;
for(var next = 0; next < 7; next++) { // was: while (len && ((uintptr_t)next & 7) != 0) {
    crc = crc32c_table[0][(crc ^ str.charCodeAt(next++)) & 0xff] ^ (crc >> 8);
    len--;
}
while (len >= 8) {
    // was: crc ^= *(uint64_t *)next;
    crc ^= str.charCodeAt(next);
    crc = crc32c_table[7][crc & 0xff] ^
          crc32c_table[6][(crc >> 8) & 0xff] ^
          crc32c_table[5][(crc >> 16) & 0xff] ^
          crc32c_table[4][(crc >> 24) & 0xff] ^
          crc32c_table[3][(crc >> 32) & 0xff] ^
          crc32c_table[2][(crc >> 40) & 0xff] ^
          crc32c_table[1][(crc >> 48) & 0xff] ^
          crc32c_table[0][crc >> 56];
    next += 1;
    len -= 1;
}
while (len) {
    // was: crc = crc32c_table[0][(crc ^ *next++) & 0xff] ^ (crc >> 8);
    crc = crc32c_table[0][(crc ^ str.charCodeAt(next++)) & 0xff] ^ (crc >> 8);
    len--;
}
return crc ^ 0xffffffff;
}


// a helper function
function gen2darr( rows, cols, defaultValue){
    var arr = [];
    for(var i=0; i < rows; i++){
            arr.push([]);
            arr[i].push( new Array(cols));
            for(var j=0; j < cols; j++){
                    arr[i][j] = defaultValue;
                    }
            }
    return arr;
}

Still, no luck. No matter what function, what table or what polynom do I use, the results do not match:

SSE4.2: 606105071
JS (example): 1249991249 

Then I go around thinking that it must be something with conversion from Javascript strings to C/C++ data and I see that nodeJS implementation uses UTF8 (https://github.com/Voxer/sse4_crc32/blob/master/src/sse4_crc32.cpp#L56) while Javascript uses UCS-2 encoding.

Now, the questions that I have are these:

  1. Is any of these functions valid? The first seems so, for the one I posted I am not sure if I translated all the bitwise operations correctly
  2. How to go around encoding issues? Is that even an encoding issue as I suspect? Does anyone has any other ideas how to ensure that nodeJS HW implementation and client side implemntation return the same output?

Thanks for any ideas!

Community
  • 1
  • 1
uskudnik
  • 33
  • 7
  • node.js comes with native hashes that would serve the same utility as well as if not better than crc: ('sha1', 'md5', 'sha256', 'sha512') - http://nodejs.org/api/crypto.html also, shop around on any JS-written function like that, there's a lot of them out there but quality and performance varies GREATLY. – dandavis Jan 28 '14 at 18:35
  • Unfortunately there is no point in using SHA for us, it will only be introduced as extension in 2015/16 with [Skylake architecture](http://en.wikipedia.org/wiki/Skylake_(microarchitecture)). We already have a satisfactory software implementation, but we are trying to speed things up by using hardware implementation, therefore, CRC32C. – uskudnik Jan 29 '14 at 20:36

1 Answers1

3

See this answer for a compatible software implementation and a fast implementation using the hardware instruction.

You have a few problems there. One is that in Javascript you need to use a logical right shift >>> instead of an arithmetic right shift >>. Second is that you are using charCodeAt, which returns the Unicode value of a character, which may be more than one byte. The CRC algorithm operates on a sequence of bytes, not a sequence of Unicode characters. Third is that you're computing the same table every time -- the table should only be computed once. Last is you're jumping straight to a complicated implementation.

As a simple example, this will compute a CRC-32C in Javascript on an array of values expected to be integers in the range 0..255, i.e. bytes:

function crc32c(crc, bytes) {
  var POLY = 0x82f63b78;
  var n;

  crc ^= 0xffffffff;
  for (n = 0; n < bytes.length; n++) {
    crc ^= bytes[n];
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
    crc = crc & 1 ? (crc >>> 1) ^ POLY : crc >>> 1;
  }
  return crc ^ 0xffffffff;
}
Community
  • 1
  • 1
Mark Adler
  • 101,978
  • 13
  • 118
  • 158
  • Hey @Mark - yes, I already saw your function and tried to go along by implementing it in Javascript (code block). That is also the reason why I started my question with disclaimer about my C++ knowledge - I had to convert your pointers into indexes and I'm not 100% sure I got everything right. Notably, but not exclusively: `for(var next = 0; next < 7; next++) { // was: while (len && ((uintptr_t)next & 7) != 0) {`; `crc ^= str.charCodeAt(next); // was: crc ^= *(uint64_t *)next;`; `crc = crc32c_table[0][(crc ^ str.charCodeAt(next++)) & 0xff] ^ (crc >> 8);` ATM I suspect encoding issues. – uskudnik Jan 29 '14 at 21:06
  • 1
    I am terribly sorry that I didn't accept this answer at that time (it required a bit more work for my particular use case and afterwards slipped my mind), but indeed it worked. @Mark: Again, please accept my apology. – uskudnik Aug 06 '16 at 16:31