implement obscured unique identifiers for existing MySQL schema

Question

I have an existing MySQL database schema in production for an PHP5 application. The application was built in a popular MVC framework (which one isn't important). We use Doctrine ORM 1.2.x as out ORM.

The default routing used a primary id, in our case simply an unsigned auto incremented integer. However some of the data is sensitive, and although we run under SSL, changing an ID value in the url could potentially give access to confidential data a user is not authorized to see.

The solution as I see it is to use some obscured value in place of the more obvious record ID.

Ideally we would just add a new column to affected table and generate some unique random or hashed value for that record right?

However, I can conceivably see a couple other tables/routes being in need of the same treatment sooner or later, and would like a reusable solution that can avoid a series of database updates. So I've been thinking about alternative methods and would like opinions on whether there are any major issues to be concerned about.

simple obfuscating the value, i.e. shifting bits and/or base 64 encoding
quick and nasty encryption
using hmac to ensure the id given matches the given hmac

update As mentioned by Charles, ACLs would be a preferred solution, however some portions of the site are open to the public, so ACLs for these areas are not possible. We do however make extensive use of ACL in the applications backend.

Charles · Accepted Answer · 2011-03-28T23:52:23.370

3

changing an ID value in the url could potentially give access to confidential data a user is not authorized to see.

Shouldn't your code have access controls that would prevent this from happening? That's going to be a more sane solution than the "security through obscurity" that obfuscated identifiers is going to bring you.

Ideally we would just add a new column to affected table and generate some unique random or hashed value for that record right?

Yup! Just remember to mark the column as UNIQUE.

So I've been thinking about alternative methods and would like opinions on whether there are any major issues to be concerned about.

The three options you mentioned are silly in their own ways. All you need is a unique identifier for that row in that table. Pull some entropy out of thin air and encode enough of it, and bingo, you have your unique identifier. Don't get too clever, you'll find yourself with a self-inflicted bullet through your foot. Don't make the random data based on a hash of the contents or the row's existing sequential identifier. The HMAC could be good enough here.

On the other hand, I wouldn't have the faintest idea on how to hook your ORM up to the new identifier, or have it generate one for you when also creating new rows.

edited Mar 28 '11 at 23:52

answered Mar 28 '11 at 23:33

Charles

50,943
13
104
142

I'm well aware of the evils of "security through obscurity", however certain portions of the site are open to public and users are not required to be authenticated, hence ACLs are not possible. We ACLs extensively in the applications backend. – xzyfer Mar 28 '11 at 23:41
Color me a bit confused -- if you have *confidential data* that can only be made visible to particular users, where do they get their own set of I-can-look-at-this IDs to begin with? I mean, you *had* that data at some point, right? The combination of the phrase *confidential data* and *open to the public* doesn't compute. – Charles Mar 28 '11 at 23:48
Members of the public are able to register interest, and even purchase tickets, to privately run seminars and such. At which point we generate invoices (the "confidential" data), which we are downloadable via a publicly available link, and also on the backend to authorized users. The link to download a pdf is something like (example.com/seminar/:seminar_id/invoice/:invoice_id/). This leaves open the possibility that `invoice_id` be changed and possibly stumble upon someone else invoice which list private data i.e. company name and billing address. – xzyfer Mar 29 '11 at 00:16
As for plugging in the ORM, we've got that covered. – xzyfer Mar 29 '11 at 00:16
We're actually currently using the `openssl_random_pseudo_bytes` to generate the random unique_key atm :) – xzyfer Mar 29 '11 at 00:20
On one hand: Invoices are produced from an Order, Orders are attached to Customers, should Invoices not be restricted to viewing by qualified Customers? On the other hand: Unless Customers belong to Accounts and you've already created an appropriate responsibility delegation mechanism such that person B in an Account can view person A's Orders and Invoices, things are going get very, very sticky. Personal experience on this one. (continued) – Charles Mar 29 '11 at 01:20
Perhaps you should restrict viewing of the Invoice itself to just authenticated users, but provide them a *specific public link* that they can *optionally* pass around that includes an additional random identifier. This way the correct users can *at their option* pass out links to the Invoice, without letting others view any random Invoice unless they're the owner. – Charles Mar 29 '11 at 01:21
In our case Customers are the general public (this is a requirement) so there are no accounts. The only authenticated users are seminar organisers/staff. The _specific public link_ is exactly what this question is about. Currently these link clearly show the seminar_id and invoice_id parameters. A lucky combination found by a savy users could give them access someone else's invoice, hence the issue. These identifiers have to replaced with something users wouldn't think to/know how to change. This is where I though HMACs would be good, or a random seed from `openssl_random_pseudo_bytes`? – xzyfer Mar 29 '11 at 01:31
Explaining the background and nature of the data in the original question would have helped a bit. Random identifiers are probably OK in this specific case, given the additional information. An HMAC built from the data could work, as could our good friend `o_r_p_b`, just don't use a hash of relevant data from the data row itself, as that could end up guessable. – Charles Mar 29 '11 at 01:51
Yeah agreed hashes are bad, the original hash was something like `sha1(time().rand())` but I didn't much like it. So we changed to `bin2hex(openssl_random_pseudo_bytes(20))`. I like the idea of a HMAC because it will mean I don't need to make database updates, and it's a reusable solution. but I think management will still jump up and down that an ID is visible regardless how futile guess other IDs will be without the correct HMAC. – xzyfer Mar 29 '11 at 02:00
Yeah, the HMAC may well be indistinguishable from randomness. If you want to melt your boss' brain, consider [doing something creative with this arbitrary base conversion code](http://stackoverflow.com/questions/5301034/how-to-generate-random-64-bit-value-as-decimal-string/5302533#5302533) (see "Step 2"). The alphabet can be extended to capital letters to get base-62. It'll look and feel like base64 to people that know it, but base64 starts with capital letters, resulting in decoding to gibberish. It's the *definition of obscurity*, but it might convince other people better than just numbers. – Charles Mar 29 '11 at 02:08
Haha I like it, might get a bit expensive though don't you think? I think I'll give HMAC a go and pray :) Thanks for the input – xzyfer Mar 29 '11 at 02:13
It shouldn't be bad if you're only building it for the link(s) and decoding it when it comes in. Using it when generating the unique IDs to begin with would indeed get kind of CPU-expensive. – Charles Mar 29 '11 at 02:16

implement obscured unique identifiers for existing MySQL schema

1 Answers1