My web apps run database-agnostically, either on MongoDB, or any SQL database
I want a single strategy for generating all the unique IDs in the whole system. User IDs, messages, forum posts, chat messages — everything — and I want the IDs to provide zero information (eg, no timestamps)
My current plan:
- generate random bits with a crypto-secure function
- use 256 bits for enough entropy to avoid collisions — probability chart on wikipedia
- represent these IDs as 64-character hexadecimal strings in app code
- use hex instead of base64 to avoid most naughty words
- also without word-break characters, hex is more easily selectable by double-clicking
Example ID: 402208a6d3295aad235c68cb20a35c30e835344bbc40fb398744c593b6aea076
My questions:
- are these IDs too long, perhaps causing unnecessary performance problems?
- are these IDs too short, perhaps encountering collisions that might cause bugs?
- under some circumstances, I could imagine needing to create many billions of objects!
- should I switch to base64 or base58 format, and just let users cope when naughty or obscene words appear in their user IDs?
- in terms of user-experience, are compact IDs worth the inevitable unfortunate words?
- should I invent my own compact encoding to avoid naughty words, perhaps using only numbers plus uppercase and lowercase consonant letters (no vowels)
- in MongoDB terms, what's the performance difference between storing and indexing these IDs as strings versus BinData?
I was hoping to gain some different perspectives about this general problem, because once I deploy my solution here, it would surely be very painful to go back and revise these decisions!