How many bytes are unique enough for twitter?

Question

I don't want my database id's to be sequential, so I'm trying to generate uids with this code:

$bin = openssl_random_pseudo_bytes(12);
$hex = bin2hex($bin);
return base_convert($hex, 16, 36);

My question is: how many bytes would i need to make the ids unique enough to handle large amounts of records (like twitter)?

Why wouldn't you want your database IDs to be sequential? If this is just for display, I suggest you find a way to display them but leave sequential IDs in your database. — Brad, Sep 18 '12 at 15:03
@Brad It's safer and allow me to hide the growth of my application. — Hugo Mota, Sep 18 '12 at 15:04
@hugo_leonardo, Safer? In what way! There is absolutely nothing unsafe about it. I would suggest that the extra time needed to check for collisions upon data insertion isn't worth it. If you want to "hide the growth" of your application, then simply display your IDs differently. Write yourself a quick algorithm. — Brad, Sep 18 '12 at 15:07
@JanPrieser Like i said...the largest amount I could think of is twitter...so it would be like...millions of users with thousands of records each — Hugo Mota, Sep 18 '12 at 15:08
The main point in those ID's is so many servers can handle requests knowing that each one can give a unique ID whichwont colide with other servers also doing similar things, Your database should be able to generate them rather than PHP doing it. What Database are you using ? — exussum, Sep 18 '12 at 15:08
I'm gonna use a relational database, but they only provide me with sequential ids. I'm trying to do something like mongodb does... — Hugo Mota, Sep 18 '12 at 15:09
@Brad are you suggesting me to do the same thing, but with duplicate data? Why would i do that? obs: of course I won't rely on unpredictable ids to secure my application, but it sure is a thin additional layer. — Hugo Mota, Sep 18 '12 at 15:12
@hugo_leonardo, Duplicate data? No, I'm not suggesting that at all! Simply display your IDs differently. What do your IDs have do with security? Nothing. I suggest you double check your assumptions and re-evaluate what you are trying to do. — Brad, Sep 18 '12 at 15:18
if you have really big data, you might want to save space and cycles in using bigint as primary key instead of strings. and like @Brad said, hash them somehow before displaying them. if you use uuids as PK you lose the efficiency of ranged queries, order by queries and ranged partitions (sharding,...) — Jan Prieser, Sep 18 '12 at 15:27
@Brad This is for a framework. So let's say some noob using it forgets to validate user credentials. Wouldn't it be at least a little safer if people could not predict ids? — Hugo Mota, Sep 18 '12 at 15:36
@hugo_leonardo, No, it wouldn't. What do IDs have to do with validating user credentials? — Brad, Sep 18 '12 at 15:38
@Brad With no validation, anyone could access let's say: /user/1/edit. But most script kids wouldn't make it if they had to guess something like /user/sa134dsf15351/edit. I'm not saying it's safe, but it surely helps. But we're missing the point here, i'm just concerned about growth hiding. I don't want my users to know that they wrote the 3rd or 4th comment of the app :/ — Hugo Mota, Sep 18 '12 at 15:40
@JanPrieser So just make the ids digits-only would solve these issues? I really don't want them to be sequential... — Hugo Mota, Sep 18 '12 at 15:43
@hugo_leonardo, You under-estimate script kiddies, bots, and ease-of-use of Google. If you were truly just concerned with hiding growth, you'd take the suggestion I gave you in the first place. I certainly hope I never have to use this framework that you are developing. You value this false notion of security over the usability, speed, and reliability of your database. Please, share the name of this creation so we can avoid it in the future. — Brad, Sep 18 '12 at 15:45
@Brad I'm really willing to learn here, not offend anyone. So back to your suggestion: can you give me a hint on how to display it differently without data duplication? — Hugo Mota, Sep 18 '12 at 15:52
@hugo_leonardo, Take a look at Dan's answer. He answer is exactly what I am suggesting. You don't store these values... they are for display only. What you store in your database is a sequential ID. — Brad, Sep 18 '12 at 15:54

score 3 · Answer 1 · answered Sep 18 '12 at 15:05

3

Use PHP's uniqid(), with an added entropy factor. That'll give you plenty of room.

answered Sep 18 '12 at 15:05

Madara's Ghost

172,118
50
264
308

1

@hugo_leonardo: Is that so, can you predict what would it be now? How could you possibly know what was the exact microtime at the time the ID was generated. Please, it's random enough for 99.9% of the causes, and yours doesn't look like the 0.01%. – Madara's Ghost Sep 18 '12 at 15:20
knowing one id, it would be very easy to guess the next (or previous) one with little brute force. But, anyway...according to @Jan the id's should be digits-only, so uniqid won't do it. – Hugo Mota Sep 18 '12 at 15:46

score 2 · Accepted Answer · edited May 23 '17 at 11:43

You might considering something like the way tinyurl and other shortening services work. I've used similar techniques, which guarantees uniqueness until all combinations are exhausted. So basically you choose an alphabet, and how many characters you want as a length. Let's say we use alphanumeric, upper and lower, so that's 62 characters in the alphabet, and let's do 5 characters per code. That's 62^5 = 916,132,832 combinations.

You start with your sequential database ID and you multiply that be some prime number (choose one that's fairly large, like 2097593). All you do is multiply that by your database ID, making sure to wrap around if you exceed 62^5, and then convert that number to base-62 as per your chosen alphabet.

This makes each code look fairly unique, yet because we use a prime number, we're guaranteed not to hit the same number twice until we've used all codes already. And it's very short.

You can use longer keys with a smaller alphabet, too, if length isn't a concern.

Here's a question I asked along the same lines: Tinyurl-style unique code: potential algorithm to prevent collisions

exussum · Answer 3 · 2012-09-18T15:29:30.283

0

use MySQL UUID

insert into `database`(`unique`,`data`) values(UUID(),'Test');

If your not using MySQL search google for UUID (Database Name) and it will give you an option

Source Wikipedia

In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%

edited Sep 18 '12 at 15:29

answered Sep 18 '12 at 15:16

exussum

18,275
8
32
65

score 0 · Answer 4 · answered Sep 18 '12 at 15:24

0

Assuming that openssl_random_pseudo_bytes may generate every possible value, N bytes will give you 2 ^ (N * 8) distinct values. For 12 bytes this is 7.923 * 10^28

answered Sep 18 '12 at 15:24

galymzhan

5,505
2
29
45

How many bytes are unique enough for twitter?

4 Answers4

Linked