2

Background:

I am creating a service booking website. Each order needs to have a unique order number. I have chosen 16 digits because that's what the previous software used.

Questions

I am not sure if there is any benefit to putting data into the order number or if it should just be a purely random string.

If it is just a random string then its only purpose is to act as an ID. If that is the case, then why not just use an incremental ID? Other then to obfuscate the number of orders we have generated to the end user I can't think of a good reason.

If it is a good idea to put data into the string, what kind of data should I include? Probably the date of the order, but other then that I don't know.

I am currently generating a purely random 16 digit string like this.

public function generateOrderNumber()
{
    $time = time(); // Time (CET) to hash
    $token = md5($time); // Hash stored in variable
    return str_shuffle(substr($token, 0, 16)); // Hash shortened to 5 chars and randomised
}

However I am not sure if this is good enough for production.

Jethro Hazelhurst
  • 3,230
  • 7
  • 38
  • 80
  • 3
    why not just use an auto increment? if you want to obfuscate you make it start on a random number – NDM Sep 20 '16 at 15:57
  • 4
    we have no idea either. your production needs/requirements are completely unknown to us. plus, why str_shuffle? the md5 hash value is always essentially "random". shortening it reduces the keyspace and INCREASES the chance of a collision, and shuffling it doesn't give you help prevent collisions. – Marc B Sep 20 '16 at 15:57
  • Makes sense @Marc B, thanks for clarifying that. I assume that if we generated enough orders there is a chance of a primary key match when using MD5, albeit a very low chance. – Jethro Hazelhurst Sep 20 '16 at 15:59
  • Regarding putting data into the string, remember, the term, `meaningful code`, is an oxymoron. – Dan Bracuk Sep 20 '16 at 15:59
  • Could it be true that sometimes order numbers are given to a user (particularly when the product is a not physical, but rather a service) purely for psychological reasons, to feel more 'official'? I ask because I can not think of a reason why an order number would be more useful then an incriminating ID... – Jethro Hazelhurst Sep 20 '16 at 16:04
  • 2
    reasons to obfuscate the number: hide how many orders the company gets, make it harder to guess order numbers of other customers (of course this should not be the only safety measure) – cypherabe Sep 20 '16 at 16:13
  • Also, ensure your system doesn't allow users to enumerate through order numbers to find info about another persons order. – ʰᵈˑ Sep 20 '16 at 16:16
  • That point about making it harder to guess order numbers of other customers is really the best one I have heard so far. – Jethro Hazelhurst Sep 20 '16 at 16:17
  • @hd so it would be best to have a check to see if the order number AND the users session id matches with the order umber and user id stored in the order table? So it is in effect like an unencrypted password? – Jethro Hazelhurst Sep 20 '16 at 16:19
  • Non-sequential order numbers are often used, as you imply, to obfuscate information about any other order. If you give someone order #1234 they can't reliably try a brute force attack on your system with order #1233, it may not exist. It also protects against other information accidentally leaking out by way of inferring something from a pattern in the order numbers. It's not often to blind the customer to what you're doing, but to blind the world to what any of your customers are doing. – MatBailie Sep 20 '16 at 16:19

2 Answers2

3

If you need globally unique, say across multiple databases that are synchronized at intervals, then I'd go with standard 128-bit GUID which could be squeezed into 16 8-bit bytes to maintain backwards compatibility. PHP has com_create_guid to generate GUIDs.

Yimin Rong
  • 1,890
  • 4
  • 31
  • 48
  • Can bytes be anything other than 8 bits? *(I thought 4 bits made a nibble, and 16 bits made a word, thus 8 bits was always a byte. So you could just say `16 bytes`?)* – MatBailie Sep 20 '16 at 16:16
  • These days yes, but historically not always. I'm a C old timer, like punch cards and timesharing. – Yimin Rong Sep 20 '16 at 16:38
0

MD5 only produces values in the a-f0-9 range which is severely limiting here. You really need to expand this and use the entire alphabet, maybe even Base62, a variant of Base64 minus the two "annoying" characters.

A cryptographically random number, not the junk rand() produces, encoded as a 5-character Base62 value could work.

If you need people to be able to read and write these values by hand you'll want to omit 0, O and 1 and l and I for clarity.

Remember, on really short values you will probably get collisions so you'll need to test any INSERT you do against a UNIQUE constraint and retry if they fail.

tadman
  • 208,517
  • 23
  • 234
  • 262
  • Are those "annoying" characters in base64, sometimes the padding characters (`=`) http://stackoverflow.com/a/4492448/3000179? – ʰᵈˑ Sep 20 '16 at 16:15
  • Yeah, the ones that are annoying are the `=` padding character but also `+` and `/` which mess up values put in URLs. That's why Base62 is a popular alternative. – tadman Sep 20 '16 at 16:20