20

In my user database table, I take the MD5 hash of the email address of a user as the id.

Example: email(example@example.org) = id(d41d8cd98f00b204e9800998ecf8427e)

Unfortunately, I have to represent the ids as integer values now - in order to be able to use an API where the id can only be an integer.

Now I'm looking for a way to encode the id into an integer for sending an decode it again when receiving. How could I do this?

My ideas so far:

  1. convert_uuencode() and convert_uudecode() for the MD5 hash
  2. replace every character of the MD5 hash by its ord() value

Which approach is better? Do you know even better ways to do this?

I hope you can help me. Thank you very much in advance!

Kerem
  • 11,377
  • 5
  • 59
  • 58
caw
  • 30,999
  • 61
  • 181
  • 291
  • 1
    Do you have no other, possibly intern, value to identify your users? Well, you should have at least a PK for your user table. – Malax Sep 14 '09 at 17:10
  • @Malax: Yes, the Primary Key is the id field containing the MD5 hash. Is this a bad solution? – caw Sep 14 '09 at 17:24
  • 3
    Using an MD5 hash as the primary key of your user table is generally not a great idea. An auto-increment/serial integer id is 4-8 bytes. An md5 hash is 32 bytes. Comparisons on integer values (e.g. everytime you JOIN on that table, or SELECT a row from it) will be many times faster than comparing 32 byte strings, and integer values will require less storage. And really... if you want to use a string value, why not use the email address itself? It's going to be 32 bytes or less, most of the time. – Frank Farmer Sep 14 '09 at 17:50
  • @Frank Farmer: Yes, in most cases it would really be better to take the integer as the primary key. But my users shouldn't be able to enumerate the ids. They could easily change the GET parameters and walk through all ids. This shouldn't be possible so I take the hashes. – caw Sep 14 '09 at 19:45
  • 1
    That certainly is a concern, but there are better ways to address it -- there's no need to design your database schema around it. Just don't expose ids as GET parameters. And do permissions checking at the page level. If you're really married to the hash idea, at least use a hash algorithm that maps to a 32/64 bit int space. A raw, unsalted MD5 hash of email address isn't terribly secure for those purposes anyway. – Frank Farmer Sep 14 '09 at 19:56
  • You should never get the user id directly from the client and trust it. You should instead check that the client has valid auth credentials and get its id from the database. What you are doing is like putting a password in the query string. – Matteo Riva Jan 03 '10 at 15:28
  • All answers here so far are wrong, since they didn't take [endianness](https://en.wikipedia.org/wiki/Endianness) into account. An MD5 digest is the hexadecimal representation of a sequence of 4 32-bit integers stored in **little-endian** format. So there must be some reordering to calculate the 128-bit number. Like in [this](https://stackoverflow.com/a/76165026/3410351) answer. – xamid May 10 '23 at 13:57

9 Answers9

21

Be careful. Converting the MD5s to an integer will require support for big (128-bit) integers. Chances are the API you're using will only support 32-bit integers - or worse, might be dealing with the number in floating-point. Either way, your ID will get munged. If this is the case, just assigning a second ID arbitrarily is a much better way to deal with things than trying to convert the MD5 into an integer.

However, if you are sure that the API can deal with arbitrarily large integers without trouble, you can just convert the MD5 from hexadecimal to an integer. PHP most likely does not support this built-in however, as it will try to represent it as either a 32-bit integer or a floating point; you'll probably need to use the PHP GMP library for it.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • 3
    +1 for hinting that the resulting value might be too big for the API even when using the bare bytes as integer. You should find another solution for your "email address to integer"-problem. – Malax Sep 14 '09 at 17:09
  • 1
    Thank you very much! Would this be the solution better than my two ideas? $id_integer = base_convert($id_string, 16, 10); – caw Sep 14 '09 at 17:26
  • Read the warning on base_convert's docs (http://www.php.net/manual/en/function.base-convert.php) - it is NOT suitable for large numbers. And MD5s are very large numbers. You _must_ use a bigint library, and the API you're accessing must do so too - but I doubt it does. Just add another column and assign arbitrary IDs to each user, it'll be much easier. – bdonlan Sep 14 '09 at 22:02
  • An MD5 digest (hash) is 128 bits (16 binary bytes, 32 hexadecimal characters). This can be represented by two 8 byte integers. Use two large integer columns as the primary key. If your software framework doesn't support multi-column foreign keys, this might be an issue, otherwise a straight-forward solution. – karmakaze Jun 23 '12 at 23:34
10

There are good reasons, stated by others, for doing it a different way.

But if what you want to do is convert an md5 hash into a string of decimal digits (which is what I think you really mean by "represent by an integer", since an md5 is already an integer in string form), and transform it back into the same md5 string:

function md5_hex_to_dec($hex_str)
{
    $arr = str_split($hex_str, 4);
    foreach ($arr as $grp) {
        $dec[] = str_pad(hexdec($grp), 5, '0', STR_PAD_LEFT);
    }
    return implode('', $dec);
}

function md5_dec_to_hex($dec_str)
{
    $arr = str_split($dec_str, 5);
    foreach ($arr as $grp) {
        $hex[] = str_pad(dechex($grp), 4, '0', STR_PAD_LEFT);
    }
    return implode('', $hex);
}

Demo:

$md5 = md5('example@example.com');
echo $md5 . '<br />';  // 23463b99b62a72f26ed677cc556c44e8
$dec = md5_hex_to_dec($md5);
echo $dec . '<br />';  // 0903015257466342942628374306682186817640
$hex = md5_dec_to_hex($dec);
echo $hex;             // 23463b99b62a72f26ed677cc556c44e8

Of course, you'd have to be careful using either string, like making sure to use them only as string type to avoid losing leading zeros, ensuring the strings are the correct lengths, etc.

GZipp
  • 5,386
  • 1
  • 22
  • 18
  • Thank you very much. This is how it would work. But now I can see what all the others wanted to say: The new integer is very long. And the leading zero is a problem, too. – caw Sep 15 '09 at 10:44
  • Glad to help. Keep in mind that the decimal digit string and the hex digit string (the md5 string) are not equal mathematically; they are merely "translations" of each other, produced by these companion functions, into their respective digit symbol sets. – GZipp Sep 15 '09 at 16:54
10

A simple solution could use hexdec() for conversions for parts of the hash.

Systems that can accommodate 64-bit Ints can split the 128-bit/16-byte md5() hash into four 4-byte sections and then convert each into representations of unsigned 32-bit Ints. Each hex pair represents 1 byte, so use 8 character chunks:

$hash = md5($value);

foreach (str_split($hash, 8) as $chunk) {
    $int_hashes[] = hexdec($chunk);
}

On the other end, use dechex() to convert the values back:

foreach ($int_hashes as $ihash) {
    $original_hash .= dechex($ihash);
}

Caveat: Due to underlying deficiencies with how PHP handles integers and how it implements hexdec() and intval(), this strategy will not work with 32-bit systems.

Edit Takeaways:

  • Ints in PHP are always signed, there are no unsigned Ints.

  • Although intval() may be useful for certain cases, hexdec() is more performant and more simple to use for base-16.

  • hexdec() converts values above 7fffffffffffffff into Floats, making its use moot for splitting the hash into two 64-bit/8-byte chunks.

  • Similarly for intval($chunk, 16), it returns the same Int value for 7fffffffffffffff and above.

codebard
  • 146
  • 1
  • 5
  • Beware to use UNSIGNED BIGINT . The sign will loose you one bit each. In other words you CANNOT do this in PHP as it has no unsigned values. You should stay in the SQL realm – theking2 Jan 30 '23 at 07:11
2

Why ord()? md5 produce normal 16-byte value, presented to you in hex for better readability. So you can't convert 16-byte value to 4 or 8 byte integer without loss. You must change some part of your algoritms to use this as id.

Alexey Sviridov
  • 3,360
  • 28
  • 33
  • MD5 produces a 20-byte value. – bdonlan Sep 14 '09 at 17:03
  • 2
    Hmmm... may be i'm completely stupid but... fred@fred-desktop:~$ md5sum citycode.sql 734e4d6f039a81c8a196db588e1cb002 citycode.sql 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 73 4e 4d 6f 03 9a 81 c8 a1 96 db 58 8e 1c b0 02 here a marco92w (question owner) value d4 1d 8c d9 8f 00 b2 04 e9 80 09 98 ec f8 42 7e what's wrong with me? Where is additionally four bytes? – Alexey Sviridov Sep 14 '09 at 17:28
  • @bdonlan: No, 128 bits is equal to 16 bytes, isn't it? – caw Sep 14 '09 at 17:31
  • @Alexey Sviridov: You're absolutely right, MD5 is 16 bytes. Thank you! :) I'll have to choose another way than converting MD5 to integer ... – caw Sep 14 '09 at 17:33
  • @marco92w why you want use only md5 to produce integer by string? Why don't use your own hash function? There is some good tutorials, just google it. – Alexey Sviridov Sep 14 '09 at 17:37
  • @Alexey Sviridov: Good idea. But I think I should rather change the DB structure than start coding an own hash algorithm ... :) – caw Sep 14 '09 at 17:46
  • 1
    The PHP function *md5()* returns a 32-character string by default. If the second parameter (which is *false* by default) is set to *true* a 16-character string is returned. – GZipp Sep 14 '09 at 19:27
1

You could use hexdec to parse the hexadecimal string and store the number in the database.

Malax
  • 9,436
  • 9
  • 48
  • 64
  • 1
    Does that handle 160-bit integers without munging them? – bdonlan Sep 14 '09 at 16:59
  • 3
    answer: no, it converts to float, according to the documentation. So you'll lose about 120 bits of data, and will be unable to recover the original MD5 later. – bdonlan Sep 14 '09 at 16:59
  • 3
    You're right, the MD5 sum is too big to store it as an 32 bit integer. Ignore my answer. ;-) – Malax Sep 14 '09 at 17:03
  • Thank you Malax, it was helpful, though. :) Is hexdec($string) the same as base_convert($string, 16, 10)? – caw Sep 14 '09 at 17:28
1

Couldn't you just add another field that was an auto-increment int field?

SeanJA
  • 10,234
  • 5
  • 32
  • 42
1

what about:

$float = hexdec(md5('string'));

or

$int = (integer) (substr(hexdec(md5('string')),0,9)*100000000);

Definitely bigger chances for collision but still good enaugh to use instead of hash in DB though?

MaartenDev
  • 5,631
  • 5
  • 21
  • 33
Marcin
  • 5,469
  • 15
  • 55
  • 69
  • this one is even better: sprintf("%u",crc32(md5('string'))); – Marcin Jan 03 '10 at 17:21
  • well lemme calculate 32*16 bit ... you'll neeed 64bytes. dunno any float or double that long ;) your number will loose precision by truncation or rounding – The Surrican Dec 12 '10 at 13:40
0

Add these two columns to your table.

`email_md5_l` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(left(md5(`email`),16),16,10)) STORED,
`email_md5_r` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(right(md5(`email`),16),16,10)) STORED,

It might or might not help to create a PK on these two columns though, as it probably concatenates two string representations and hashes the result. It would kind of defeat your purpose and a full scan might be quicker but that depends on number of columns and records. Don't try to read these bigints in php as it doesn't have unsigned integers, just stay in SQL and do something like:

select email 
into result 
from `address`
where url_md5_l = conv(left(md5(the_email), 16), 16, 10)
  and url_md5_r = conv(right(md5(the_email), 16), 16, 10) 
limit 1;

MD5 does collide btw.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
theking2
  • 2,174
  • 1
  • 27
  • 36
-1

Use the email address as the file name of a blank, temporary file in a shared folder, like /var/myprocess/example@example.org

Then, call ftok on the file name. ftok will return a unique, integer ID.

It won't be guaranteed to be unique though, but it will probably suffice for your API.

humbads
  • 3,252
  • 1
  • 27
  • 22
  • Ftok is only available on linux and uses (https://www.php.net/manual/en/function.shmop-open.php) not likely this will always generate the same inode – theking2 Jan 30 '23 at 07:27