0

I use session id and microtime() to generate unique strings(I am aware of uuid, but need to do in this way): is it possible to have duplicates in here ? Mainly, is there a possibility that 2 or more session ids can match at the same moment of time - calculated in microseconds ? considering that the website's traffic is a few million during the day (if it matters of course).

Thanks

dav
  • 8,931
  • 15
  • 76
  • 140
  • so you just concatenate the two together? – Prisoner Aug 15 '13 at 11:01
  • 1
    I think you're safe, see this other question: http://stackoverflow.com/questions/138670/how-unique-is-the-php-session-id – Chris Aug 15 '13 at 11:02
  • A collision is extremely rare, but not impossible if you have enough computer power, so many many cores serving requests in parallel. – arkascha Aug 15 '13 at 11:03
  • If you need randomness, ask the sources of random, not the clock. – Sven Aug 15 '13 at 11:09
  • @Prisoner, not just concatenate exactly but smth like that, thanks – dav Aug 15 '13 at 13:31
  • @Sven, but is not it better to use time also, because in fact when u generate smth random though with very low possibility but there a possibility of duplicates, but time never repeats so at least the part that is based on time is 100 % unique. Is there any source of random that will never repeated ? (without uuid). THanks – dav Aug 15 '13 at 13:34
  • There are basically three different tasks: 1) Identify something uniquely with a list of existing ids (think of autoincrement values in a database) 2) identifying something without a list (think UUID) and 3) randomness. With randomness, you can only guarantee that duplicate identifiers will be VERY unlikely without checking previous ids. And it depends on what you want to do. Microtime is a very bad source of randomness, it has only one million possible values per second, and you can usually narrow down the time range even more. – Sven Aug 16 '13 at 07:25

2 Answers2

1

The session ID is extremely unlikely to have a collision. Adding microtime should make it safe. If you want to be really safe and also protect yourself from having the strings be guessable, you could include the user's IP address as well as a hard-coded private key in your code, and then do a quick md5 hash on the whole thing. Use the hash as your string. Using uniqid() might be better but, sounds like you can't do that for some reason.

verv
  • 686
  • 4
  • 7
  • 2
    You shouldn't rely on the user's IP address either, as multiple sessions could come from the same IP address, and it's actually more common than you might think. Any users on the same LAN will most likely have the same public facing IP address, and users using a proxy will as well. – brianmearns Aug 15 '13 at 12:22
  • Thanks for the reply, I guess I agree with @sh1ftst0rm about the idea of IP. About making a hash, I think it is not a good idea, cause it increases the possibility of collision, please see my other question, to avoid it is necessary to use sha512 or "more" hashing, but unfortunately they give me very long strings http://stackoverflow.com/questions/18164868/possible-collision-hashing-uuid-cakephp – dav Aug 15 '13 at 13:41
  • 1
    @sh1ftst0rm, I didn't say to abandon the other data, just add more. I am aware that users often share IP addresses, and they can spoof and mess with them, I've dealt with that myself, but you've now reduced collision risks from the global population down to a hand full of users sharing an IP, meaning collisions in the OTHER criteria are now even more insignificant. Same microtime from two people in the entire world, when you have millions of pageviews, ok MAYBE. Same microtime from two people sharing an IP address, SIGNIFICANTLY less likely. It's still not perfect but a vast improvement. – verv Aug 15 '13 at 20:41
  • @Davo, how does hashing increase possibility of collision? In a string of this length, collision is practically impossible. Look up examples of people actually searching for collisions. It is HARD to find collisions, they don't just show up willy nilly :) Stretching a hash (by rehashing dozens or hundreds of times) does not reduce collisions. It makes it harder to reverse a hash via rainbow tables. Using a longer hash string will obviously reduce collisions but, you're not going to have any collisions in the first place. – verv Aug 15 '13 at 20:44
  • 1
    Also I should probably mention that md5 itself is not secure. If you need it to also be unguessable, build your key and then use something like phpass to properly stretch and salt your hash, and with a better algorithm than md5. md5 is fast, but its speed actually makes it less secure. You don't really say whether it needs to be secure or not. – verv Aug 15 '13 at 20:47
  • @verv, See, I use current microtime and session id to generate a string, plus if I add ip as well(which looks like a good idea after all) the final string becomes 50-60 characters long, and the hashed string is only 40 characters long, so, according to http://en.wikipedia.org/wiki/Pigeonhole_principle, there is a possibility of collision, though small, but I guess I can not take the risk cause my db can contain millions of rows. About md5 - I use cakephp and it takes into account using of salt, so it should be ok. – dav Aug 15 '13 at 21:19
  • @verv: Understood. I was mostly just pointing out that you shouldn't _rely_ on IP addresses, but you're right, including them can reduce the chances of collision. – brianmearns Aug 16 '13 at 00:39
0

If session IDs were duplicated at the same time, it would mean that two (or more) of your visitors are sharing a session. Remember that the session ID is the only way PHP has to decide which session to load, so having duplicate IDs defeats the purpose of sessions and is a huge security problem. I'm not saying it's not possible, you would need to know exactly how PHP generates session IDs in order to know that. But it's most likely highly improbable.

The manual only seems to briefly touch on this. While it's not exactly a hard guarantee, it does state here:

A visitor accessing your web site is assigned a unique id, the so-called session id

Note that you can specify you own session IDs by passing an argument to the session_id function, so you could come up with your own session ID and use some method to ensure that it is unique. For instance, you could store a pool of recently used session IDs, or use a one-to-one function of a counter which you can manually increment every time you create a new session.

As a final note, I don't think microtime() is guaranteed to be accurate to one microsecond, meaning it may not actually change every microsecond. To be safe, I would assume that it only changes once per second, and then make sure that no session ID is reused within one second. Using either of the methods described above, this is fairly easy. If you're keeping a pool of recent session IDs, just make sure they are not removed from the pool sooner than one second after they were last used. If you're using a counter, just make sure it does not reset or roll over within one second. Of course, in either case, you also want to make sure a session ID isn't reused if another session is still using it, but that's not strictly related to your question.

brianmearns
  • 9,581
  • 10
  • 52
  • 79
  • 2
    Just be careful that there's no way anyone can see these session IDs, or reverse any hashing you do on them. Using session IDs taken from your live users leaves you open to session hijacking. If they're from sessions that have already been closed, then it should be fine. – verv Aug 15 '13 at 21:01
  • 1
    That's a really good point. If you're generating IDs as a function of a counter, you need to be really careful to make sure the counter value itself is not discoverable, or someone could easily figure out the next couple of session IDs that will be used. – brianmearns Aug 16 '13 at 00:35