I just want to know for a CRC32 hash function, specifically the PHP crc
function, will I get 2^32 (4 billion) different values for an input value (integer) that is guaranteed to be incremented sequentially from 1 to 4 billion?
Asked
Active
Viewed 2,239 times
1

jonrsharpe
- 115,751
- 26
- 228
- 437

Kagiso Marvin Molekwa
- 915
- 1
- 9
- 25
-
1No. CRC32 has no intention of being particular collision-proof. But if your possible inputs are all 32 bit numbers, why go through the trouble of hashing them to 32 bits? Just use them directly. – Peter Jun 25 '19 at 09:03
-
1This sounds like an XY problem. What is the real problem you're trying to solve? – Jim Mischel Jun 25 '19 at 17:34
-
Why don't you write the program to do it? 2^32 (or even 2^30) is not a large space. A well written program should finish rather quickly. – jww Jun 25 '19 at 22:18
-
guys, to answer the first 2 comments, I'm trying to create a URL shortening service like `goo.gl`. So I'm basically trying to assign the CRC hashed number to be the shortened URL – Kagiso Marvin Molekwa Jun 26 '19 at 13:22
-
1To create a URL shortening service, you don't hash the URL. Instead, you create a function that returns the next sequential value, and then you obfuscate the value. My blog post https://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/ shows how to do that. See also https://stackoverflow.com/a/34420445/56778 – Jim Mischel Jul 19 '19 at 18:16
-
1See also https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener and https://stackoverflow.com/questions/742013/how-do-i-create-a-url-shortener, and any number of other references. – Jim Mischel Jul 19 '19 at 18:18
-
@JimMischel Your blog's post https://blog.mischel.com/2017/06/20/how-to-generate-random-looking-keys/ doesn't work – Mohammad Kholghi Aug 28 '21 at 09:47
-
CRC makes bad hash functions. Here is a great concise article on the topic: https://eklitzke.org/crcs-vs-hash-functions – Dhmclean Jul 03 '19 at 19:57
1 Answers
2
I don't think CRC32 was specifically designed to have no collisions for all possible four-byte inputs. However, it does seem to work that way. You can verify this yourself by simply checking every possible output. To speed things up, I used the following C program:
/* Compile: cc crc_check.c -O3 -lz -o crc_check */
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <zlib.h>
int main() {
uint32_t x, y, d;
uint64_t i, *seen, mask;
seen = calloc(0x4000000, 8);
if (!seen) return -1;
/* Make sure we're calculating the same values as PHP's crc32 function */
printf("crc32(\"ABCD\") = %lu\n", crc32(0, (unsigned char*)"ABCD", 4));
for (i=x=0; i<0x100000000ULL; i++) {
y = crc32(0, (unsigned char*)(&x), 4);
mask = 1ULL << (y & 0x003fULL);
d = y >> 6;
if (seen[d] & mask) {
printf("Collision detected (x=%u, y=%u)\n", x, y);
return 0;
}
seen[d] |= mask;
x++;
}
puts("No collisions detected");
return 0;
}
/*
Output:
crc32("ABCD") = 3675725989
No collisions detected
*/
Just to make sure zlib is using the same function, I included a line to output the CRC32 checksum of the string "ABCD". PHP produces the same value:
$ php -r 'echo crc32("ABCD");'
3675725989
I have to ask, though: what do you need this information for? If you want to convert sequential 32-bit integers into unique pseudorandom values, there are much more efficient ways of doing this. For example, consider using a linear congruential generator.

r3mainer
- 23,981
- 3
- 51
- 88
-
Thanks for your answer. I'm trying to create a URL shortening service like `goo.gl`. So I'm basically trying to assign the CRC hashed number to be the shortened URL. So basically, I want to know whether I'll be able to generate 4 billion different URLs without worrying about duplication/collision – Kagiso Marvin Molekwa Jun 26 '19 at 13:24
-
OK, but how does the CRC32 function help with this? All you're doing is converting one 32-bit number into another 32-bit number in a rather complicated way. (Perhaps I should have pointed out that the CRC checksums are only unique if you work on the raw byte values; i.e., `$crc_value = crc32(pack("L",$counter_value));`. Wouldn't you be better off writing a function that converts an integer value into a base-62 string (using the characters 0-9, a-z and A-Z as digits)? – r3mainer Jun 26 '19 at 14:59
-
The CRC32 helps convert a number into a 4 character string, eg. `f38d`. That's why I'm using a hash. – Kagiso Marvin Molekwa Jun 29 '19 at 06:37
-
2No, that isn't going to work. If you're just passing numerical variables to this function, there will be many collisions, e.g. crc32(86821) == crc32(14740600). The results are only unique if you pass a 4-character string corresponding to the raw bytes of the binary representation of each number (`$crc_value = crc32(pack("L",$counter_value));`, as I mentioned above). – r3mainer Jul 02 '19 at 14:57