2

I've been reading articles all day on the subject but am not sure how to proceed.

I have a MySQL table of users with a primary key of UUID_SHORT() values and an email field, which is also unique. I need to be able to generate an always unique, non-guessable ID to use in an activation URL. Something like this:

http://example.com/activate.php?id=tKd32f

After reading this question: How to code a URL shortener?

I've implemented a base58 encoding method that I'm trying to use with the UUID_SHORT() value representing the user, as generated by the MySQL server.

<?php
function base58_encode($num, $alphabet) 
{                           
    $base_count = strlen($alphabet);
    $alphabet =  str_split($alphabet);
    echo "base_count: " . $base_count . "<br>";
    $encoded = '';
    while ($num >= $base_count) 
    {
        $div = $num/$base_count;
        echo "div: " . $div . "<br>";
        $mod = ($num-($base_count*intval($div)));
        echo "mod: " . $mod . "<br>";
        echo "alphabet[$mod]: " . $alphabet[$mod] . "<br>";                             
        $encoded = $alphabet[$mod] . $encoded;
        echo "encoded: " . $encoded . "<br>";
        $num = intval($div);
        echo "num: " . $num . "<br>";
        echo "------------------------------<br>";
    }

    if ($num) $encoded = $alphabet[$num] . $encoded;

    return $encoded;
}

$alphabet = "123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ";
echo "WORKS => " . base58_encode(3707787591, $alphabet) . "<br><br>"; // some random number
echo "DOES NOT WORK => " . base58_encode(23298059492392988, $alphabet); // one of the unique IDs in my database
?>

If I use a 10 digit number, it works. But if I use the longer UUID_SHORT value, I get an undefined offset error because $num = intval($div) returns a negative number when its param is too large. I can't seem to figure out why, but it seems to be related to: http://ca1.php.net/intval

My questions:

  1. Is this even the correct approach? Should I be using the UUID_SHORT value to generate the activation code or is that too guessable? If I do a SELECT UUID_SHORT() multiple times on the MySQL server, you can see this value increment by 1 (which worries me).

  2. If this base58_encode is the correct approach, how would I resolve this error? How can I get a shorter activation code, not unlike goo.gl's url shortener service. How do they do it? I couldn't find a clear answer on this.

  3. Should I be using something more along the lines of this? How to generate a secure activation string in php? However, this generates a really long ID. Not really necessary in my case and it makes for uglier activation URLs.

I plan on expiring my URLs after a certain period of time, but the IDs need to always be unique (and ideally, short enough). I also don't need to later decode the value because I will map the uniquely generated ID to the user ID in the database. What is the best approach here?

Thank you for your time.

Community
  • 1
  • 1
http203
  • 851
  • 1
  • 11
  • 27

1 Answers1

1

Here's a suggestion:

Make your URL of the form

www.example.com/?code=XXXXX-YYYYY-ZZZZZ

Where

  • XXXXX is a unique identifier for the customer in your database

  • YYYYY is fractional portion of a seed, the other part of the seed stored in the database against XXXXX (YYYYY is not stored)

  • ZZZZZ is a random segment of a sha512 hash of XXXXXX and the full seed with the start and end points of the segment stored in the database.

  • As a final field in the database for authentication (that's the 5th counting the identifier), have a timestamp which is used to expire the link.

Example:

URL is requested.

uniqid() is called and assigned to XXXXX (that's 13 characters)

Let the seed be the sha512 hash of another uniqid("", true) and choose the first 5 characters as YYYYY, assigning the remainder to the database.

Let ZZZZZ be the 8-15th characters of the hash("sha512", SEED . XXXXX), storing 8 and 15 in the database. (8 and 15 being randomly chosen, ofc.)

Store the time in the database.

Email the resultant URL.

Discussion:

Since XXXXX is created with a uniqid() and a point rather hard to guess for a hacker, it is lightly secure, but almost-100% unique.

Since the seed is not fully stored, but is necessary to validate ZZZZZ, greater security is achieved, though because only substrings are stored, collisions could be more easily generated than with a longer authentication scheme.

The 5 stored elements form a key of sorts to be able to validate the user through a multi-step process which is not easily reversible, so knowing the method does not give much help (if any) in breaking the scheme.

It is assumed that only a low, finite number of guesses would be allowed before the link was invalidated.

This is not 100% secure -- nothing that relies on a simple link to be clicked where the link contains everything necessary for validation will ever be fully secure, but it will make it reasonably challenging for a hacker to achieve the hack in the amount of time allowed (depending on how long you allow the link to be valid).

I hope this helps!

  • Isn't it a bad idea to expose the customer ID? Doesn't that make YYYY and ZZZZ irrelevant if the entire purpose is to find the associated user when the link is clicked? I also don't quite understand what you mean by storing parts of the calculated YYYY and ZZZZ in the DB. (And why did you choose sha512, out of curiosity?) Say we omitted the customer ID in the code, how would I find the correct mapping if I only stored portions of it in the DB? Wouldn't I have to try the algorithm for each row to find the right one? Seems like that wouldn't be efficient. Am I missing something? – http203 Jan 03 '14 at 14:50
  • "XXXXX is created with a uniqid()" -- so, it doesn't expose the customer id. Parts are used so that the whole algo is never stored in any one location. sha512 is currently one of the best hashing methods available afaik. And, no, because the whole YYYY is able to be pieced together from what's in the URL and what's in the DB and ZZZZ can be extracted from the composite hash using the indices in the DB, you can verify without traversing the entire DB. –  Jan 03 '14 at 14:57
  • Ok thank you! I read that uniqid is not optimal, because it's based on the system time and that md5() is better to generate an extremely difficult to predict ID. Any thoughts on this? The only other drawback with your method is that it doesn't meet my requirement to have a cleaner (short) URL, like the codes used in url shorteners. I guess those aren't very secure and may be guessed? – http203 Jan 03 '14 at 15:06
  • I have another question relating to your last answer. I don't see how the whole YYYY is able to be pieced together from what's in the URL. I am testing this now, and in my url, YYYY is the first 5 characters of the generated value, and the remainder is in the DB. So how do I know which row to retrieve? I have not stored XXXX in the DB. My mapping has 1 column for the user ID, another code column for the remainder of YYYY and another code column for the 2 chars randomly chosen in ZZZZ. Wouldn't I have to traverse to see which seed generates the portion chosen in ZZZZ? – http203 Jan 03 '14 at 15:54
  • Well, yes, if you don't store XXXX, but to be fair, I did say that the XXXX would be in your DB. :) –  Jan 03 '14 at 16:52
  • Didn't see your comment about md5 and such... uniqid is not optimal, hence why I said, "it is lightly secure, but almost-100% unique." md5 is not as secure as sha512. As for short URLs, the shorter, the less secure b/c it takes fewer guesses to duplicate them. –  Jan 03 '14 at 16:53