1

I just want to know for a CRC32 hash function, specifically the PHP crc function, will I get 2^32 (4 billion) different values for an input value (integer) that is guaranteed to be incremented sequentially from 1 to 4 billion?

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    No. CRC32 has no intention of being particular collision-proof. But if your possible inputs are all 32 bit numbers, why go through the trouble of hashing them to 32 bits? Just use them directly. – Peter Jun 25 '19 at 09:03
  • 1
    This sounds like an XY problem. What is the real problem you're trying to solve? – Jim Mischel Jun 25 '19 at 17:34
  • Why don't you write the program to do it? 2^32 (or even 2^30) is not a large space. A well written program should finish rather quickly. – jww Jun 25 '19 at 22:18
  • guys, to answer the first 2 comments, I'm trying to create a URL shortening service like `goo.gl`. So I'm basically trying to assign the CRC hashed number to be the shortened URL – Kagiso Marvin Molekwa Jun 26 '19 at 13:22
  • 1
    To create a URL shortening service, you don't hash the URL. Instead, you create a function that returns the next sequential value, and then you obfuscate the value. My blog post https://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/ shows how to do that. See also https://stackoverflow.com/a/34420445/56778 – Jim Mischel Jul 19 '19 at 18:16
  • 1
    See also https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener and https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener, and any number of other references. – Jim Mischel Jul 19 '19 at 18:18
  • @JimMischel Your blog's post https://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/ doesn't work – Mohammad Kholghi Aug 28 '21 at 09:47
  • CRC makes bad hash functions. Here is a great concise article on the topic: https://eklitzke.org/crcs-vs-hash-functions – Dhmclean Jul 03 '19 at 19:57

1 Answers1

2

I don't think CRC32 was specifically designed to have no collisions for all possible four-byte inputs. However, it does seem to work that way. You can verify this yourself by simply checking every possible output. To speed things up, I used the following C program:

/* Compile: cc crc_check.c -O3 -lz -o crc_check */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <zlib.h>

int main() {
    uint32_t x, y, d;
    uint64_t i, *seen, mask;

    seen = calloc(0x4000000, 8);
    if (!seen) return -1;

    /* Make sure we're calculating the same values as PHP's crc32 function */
    printf("crc32(\"ABCD\") = %lu\n", crc32(0, (unsigned char*)"ABCD", 4));

    for (i=x=0; i<0x100000000ULL; i++) {
        y = crc32(0, (unsigned char*)(&x), 4);
        mask = 1ULL << (y & 0x003fULL);
        d = y >> 6;
        if (seen[d] & mask) {
            printf("Collision detected (x=%u, y=%u)\n", x, y);
            return 0;
        }
        seen[d] |= mask;
        x++;
    }
    puts("No collisions detected");
    return 0;
}

/*
   Output:
   crc32("ABCD") = 3675725989
   No collisions detected
*/

Just to make sure zlib is using the same function, I included a line to output the CRC32 checksum of the string "ABCD". PHP produces the same value:

$ php -r 'echo crc32("ABCD");'
3675725989

I have to ask, though: what do you need this information for? If you want to convert sequential 32-bit integers into unique pseudorandom values, there are much more efficient ways of doing this. For example, consider using a linear congruential generator.

r3mainer
  • 23,981
  • 3
  • 51
  • 88
  • Thanks for your answer. I'm trying to create a URL shortening service like `goo.gl`. So I'm basically trying to assign the CRC hashed number to be the shortened URL. So basically, I want to know whether I'll be able to generate 4 billion different URLs without worrying about duplication/collision – Kagiso Marvin Molekwa Jun 26 '19 at 13:24
  • OK, but how does the CRC32 function help with this? All you're doing is converting one 32-bit number into another 32-bit number in a rather complicated way. (Perhaps I should have pointed out that the CRC checksums are only unique if you work on the raw byte values; i.e., `$crc_value = crc32(pack("L",$counter_value));`. Wouldn't you be better off writing a function that converts an integer value into a base-62 string (using the characters 0-9, a-z and A-Z as digits)? – r3mainer Jun 26 '19 at 14:59
  • The CRC32 helps convert a number into a 4 character string, eg. `f38d`. That's why I'm using a hash. – Kagiso Marvin Molekwa Jun 29 '19 at 06:37
  • 2
    No, that isn't going to work. If you're just passing numerical variables to this function, there will be many collisions, e.g. crc32(86821) == crc32(14740600). The results are only unique if you pass a 4-character string corresponding to the raw bytes of the binary representation of each number (`$crc_value = crc32(pack("L",$counter_value));`, as I mentioned above). – r3mainer Jul 02 '19 at 14:57