1

Edit: I have since found and published an efficient and elegant solution that transforms IDs like 3141592 to strings such as vJST and backwards. It's available for PHP here:

https://github.com/delight-im/PHP-IDs

Providing some background, it uses Knuth's multiplicative hashing followed by a base conversion to generate unique, reversible, non-sequential IDs.

Problem:

I have dynamic pages in PHP where the content is shown according to the given id. The id is always submitted via a GET parameter: page.php?id=X This causes a problem: Site visitors can enumerate the ids and simply walk through all the different content pages. This shouldn't be possible, of course.

How could this be solved?

My approach is to encode all ids in links and forms which are used as a GET parameter later. At the beginning of every page, the given id is decoded into the "real" id which is used in the database. Is this a good approach? Would you choose another way?

Possible solution of my approach:

I would convert the integer id to a base 38 integer and replace the digits by characters of a given list. I would use these characters for the encoded string id:

a-z 0-9 - _

Would you use other characters as well? For these characters my script would be this:

function id2secure($old_number) {
    $alphabet_en = array(0=>'1', 1=>'3', 2=>'5', 3=>'7', 4=>'9', 5=>'0', 6=>'2', 7=>'4', 8=>'6', 9=>'8', 10=>'a', 11=>'c', 12=>'e', 13=>'g', 14=>'i', 15=>'k', 16=>'m', 17=>'o', 18=>'q', 19=>'s', 20=>'u', 21=>'w', 22=>'y', 23=>'b', 24=>'d', 25=>'f', 26=>'h', 27=>'j', 28=>'l', 29=>'n', 30=>'p', 31=>'r', 32=>'t', 33=>'v', 34=>'x', 35=>'z', 36=>'-', 37=>'_');
    $new_number = '';
    while ($old_number > 0) {
        $rest = $old_number%38;
        if (!isset($alphabet_en[$rest])) { return FALSE; }
        $new_number .= $alphabet_en[$rest];
        $old_number = floor($old_number/38);
    }
    $new_number = strrev($new_number);
    return $new_number;
}

Additional question:

What would be the reverse function for my function?

I hope you can help me. Thank you!

caw
  • 30,999
  • 61
  • 181
  • 291
  • 1
    You can shorten your array construct `array(0=>'1', …, 37=>'_')` by using a string `'1…_'`. Then you just need to replace `!isset($alphabet_en[$rest])` with `$rest >= 38` and it works like yours. – Gumbo Sep 15 '09 at 11:35
  • 3
    "This shouldn't be possible, of course." Unless users aren't allowed to visit these pages I would argue the exact opposite. Capable users should be able to navigate a site in any way they see fit. – Steerpike Sep 15 '09 at 12:05
  • @Gumbo: Thank you very much, good idea. This should be faster, right? – caw Sep 15 '09 at 13:49
  • how exactly does one "enumerage"? – nickf Sep 15 '09 at 14:41
  • Why don't you just use the php native base64 encode/decode functions? – ryeguy Sep 15 '09 at 14:42
  • @nickf: Sorry, I wanted to say "enumerate". If this is also the wrong word: I wanted to say that you can walk through all the pages by increasing the id every time. – caw Sep 15 '09 at 14:58
  • @ryeguy: Base 64 encoded strings contain the = which isn't allowed in GET parameters, is it? – caw Sep 15 '09 at 14:59

6 Answers6

8

Can the users get to the pages via the Website? If the answer is yes then you should ask yourself if this is really a problem or not.

If not then the problem is that you're not securing your pages or to put it another way: you're relying on obscurity for security, which is never a good move.

My advice? Either secure your pages so only the right users can access them or don't worry about it.

If you really must worry about it, just pass an extra field that must be correct for the given page. I wouldn't construct this from the ID. Perhaps generate another number or a GUID when you create the page entry in the database. If both fields aren't correct then don't display the page.

Forget the simple character substitution and other naive obfuscation techniques. They're a waste of your time.

Edit: if you're after non-sequential IDs that are the same length, consider using UUIDs instead of auto-increment primary keys. Basically this is done at application level:

  • Change your primary key to char(36);
  • In your insert statement you have to set the key and populate it with the MySQL UUID() function.

Take a look at To UUID or not to UUID ? and UUID as a primary key. There is performance degradation from this (specifically because you're using characters rather than integers for lookups) but unless you have a large (1 million+ rows) or data it probably won't be an issue in practice.

cletus
  • 616,129
  • 168
  • 910
  • 942
  • Now, every page is safed on its own. But I thought it would look nicer if every id has the same length. :D And the users don't know how many users my site has and which one was the first etc. – caw Sep 15 '09 at 14:24
  • 1
    You can achieve both of those things by using a GUID as an ID instead of an auto increment field. – cletus Sep 15 '09 at 14:55
  • Do you think of something like "936DA01F-9ABD-4d9d-80C7-02AF85C822A8"? How should I create it? – caw Sep 15 '09 at 15:02
1

Use a checksum algorithm like Luhn:

$id = 1337;

$_GET['id'] = Luhn($id, 3); // 1337518, adds 3 checkdigits
$_GET['id'] = Luhn_Verify($_GET['id'], 3); // 1337, returns the original number of false if validation fails

echo $_GET['id']; // 1337

EDIT: I forgot to mention, but by using this method you can check if an ID is valid without even have to query the database, example:

$id = Luhn_Verify($_GET['id'], 3);

if ($id === false)
{
    // someone is trying to guess the ID
}

else
{
    // $id is valid, do the DB stuff here
}
Community
  • 1
  • 1
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
0

It will still be possible to walk through your pages sequentially, although it would be harder to guess the pattern. As long as the root pattern is sequential you'll have a problem eventually (assuming it's actually a problem in the first place, and not just something you don't like the idea of).

You could use random numbers for the IDs. That would prevent easy guessing of page IDs and page order (again, if that matters).

acrosman
  • 12,814
  • 10
  • 39
  • 55
  • But with random numbers I would have lots of collisions, wouldn't I? – caw Sep 15 '09 at 13:36
  • It depends on how many pages you plan to have, and now many big a data type you use for storage. Again, it depends some on your system and your goal. – acrosman Sep 16 '09 at 13:02
0

You can also use Hashids to encode/decode your IDs.

This code was written with the intent of placing created ids in visible places - like the URL.

Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
It converts numbers like 347 into strings like “yr8”, or array of numbers like [27, 986] into “3kTMd”.
You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.

Taylan
  • 3,045
  • 3
  • 28
  • 38
-1

I wouldn't bother about this "problem", but anyway I used on one of my projects such method:

After saving new page to the DB, I generated md5 of (record_id + page_title) and put it to the special field pagecode. Then I accessed the pages by that page code instead of id. And it's better to index the pagecode field in the database.

Sergei
  • 2,747
  • 1
  • 17
  • 16
  • In another question here, they told me that md5 hashes aren't adequate primary keys. I should rather use integers since it's faster and less risk of collisions. – caw Sep 15 '09 at 13:37
  • Hashes are not primary keys, just indexes. – Sergei Sep 15 '09 at 14:56
  • But an ID should be the primary key since it should also be unique, shouldn't it? – caw Sep 15 '09 at 15:03
  • ID is a primary key with autoincrement. Hash is just unique key. I don't see a problem here, that site is working two years already. Registered users access those pages by id, and anonymous by page code. – Sergei Sep 16 '09 at 00:11
-2

Site visitors can enumerage the ids and simply walk through all the different content pages. This shouldn't be possible, of course.

I'm not sure why this should be a problem - people can view a list of all the (public, Googlebot-indexed) pages on a website just by typing site:domain.com into Google, and loop through them should they wish. Changing the unique index you use won't change that.

But if you really don't want visitors to access your pages directly, a simple quick-fix is to use POST instead of GET.

Waggers
  • 606
  • 3
  • 15