49

How do I generate a unique session id in Python?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Alex
  • 43,191
  • 44
  • 96
  • 127

5 Answers5

79

UPDATE: 2016-12-21

A lot has happened in a the last ~5yrs. /dev/urandom has been updated and is now considered a high-entropy source of randomness on modern Linux kernels and distributions. In the last 6mo we've seen entropy starvation on a Linux 3.19 kernel using Ubuntu, so I don't think this issue is "resolved", but it's sufficiently difficult to end up with low-entropy randomness when asking for any amount of randomness from the OS.


I hate to say this, but none of the other solutions posted here are correct with regards to being a "secure session ID."

# pip install M2Crypto
import base64, M2Crypto
def generate_session_id(num_bytes = 16):
    return base64.b64encode(M2Crypto.m2.rand_bytes(num_bytes))

Neither uuid() or os.urandom() are good choices for generating session IDs. Both may generate random results, but random does not mean it is secure due to poor entropy. See "How to Crack a Linear Congruential Generator" by Haldir or NIST's resources on Random Number Generation. If you still want to use a UUID, then use a UUID that was generated with a good initial random number:

import uuid, M2Crypto
uuid.UUID(bytes = M2Crypto.m2.rand_bytes(num_bytes)))
# UUID('5e85edc4-7078-d214-e773-f8caae16fe6c')

or:

# pip install pyOpenSSL
import uuid, OpenSSL
uuid.UUID(bytes = OpenSSL.rand.bytes(16))
# UUID('c9bf635f-b0cc-d278-a2c5-01eaae654461')

M2Crypto is best OpenSSL API in Python atm as pyOpenSSL appears to be maintained only to support legacy applications.

Sean
  • 9,888
  • 4
  • 40
  • 43
  • Those citations about UUID problems are helpful. Thanks for posting that. Question: what do you think is the best way to generate a session id? Particularly with the faults you cite in UUID implementations, how would you do it differently? I'm writing something like this right now and trying to come up with the best approach. It's also got to be fault-tolerant - e.g., can't be dependent upon connection to a database server. – ratsbane Jul 07 '11 at 20:48
  • Any of the examples above would work. The key is making use of a good random number generator that is populated with "cryptographically sufficient entropy." Beyond aesthetics and the size of the representation, there is no difference between a encoding a sufficiently random value as a `Base64` string or a `UUID`, or even a hex encoded string. To each their own. I personally prefer base64 for size reasons. – Sean Jul 07 '11 at 23:48
  • Thanks. That seems sound. I wrote it this afternoon with – ratsbane Jul 09 '11 at 17:19
  • 21
    If we strip away all the fluff, what you're basically saying is that OpenSSL.rand.bytes(16) is secure but os.urandom(16) is not. According to the docs, os.urandom's purpose is to "return a string of n random bytes suitable for cryptographic use." If generating a good session ID is not a "cryptographic use" for which os.urandom is suitable, then what is it meant for? Perhaps the correct solution is too simple for your taste, but that's Python for you. Meaningless fluff doesn't make things more secure. – Seun Osewa Nov 21 '11 at 20:27
  • 6
    @SeunOsewa, you are correct about the docs and `os.urandom` being intended to be suitable for cryptographic use, unfortunately this isn't always the case, however. FreeBSD and OS-X have a good pool for urandom, Linux is hit or miss (though getting better). Being explicit is better than implicit. BTW, the reason I posted this was because I ran in to session ID conflict in a real world situation where session ID conflicts weren't checked and users saw each other's information. Cause? urandom wasn't being seeded properly. :-/ Reality bites sometimes. – Sean Nov 29 '11 at 19:54
  • 3
    @SeunOsewa You're correct, both OpenSSL and os.urandom use the same source of entropy (/dev/urandom) and have the same level of security. – ramirami Jun 14 '13 at 15:13
  • 3
    A tentative -1. You claim without evidence that `os.urandom` is insufficiently random to be secure while OpenSSL (e.g. via M2Crypto) is better. Meanwhile @ramirami claims (also without evidence) that in fact both use the same underlying entropy source. I don't know who is right, but I'm downvoting anyway; I dislike FUD and the bold claim here (that `os.urandom` uses, or may use on some platforms, a worse source of entropy than OpenSSL, to the point that the former is cryptographically broken in contexts where the latter is secure) needs substantiating to be useful. – Mark Amery Oct 26 '14 at 12:32
  • If you generate uuids with `id = uuid.UUID(bytes = OpenSSL.rand.bytes(16))`, keep in mind that you will have `id.variant == uuid.RESERVED_FUTURE`, so not `uuid.RFC_4122` and then `id.version` is `None`. This also means that 2 bits are changed to mark the variant, all others are just randoms. You might have collisions, whereas RFC 4122 based uuids are designed to prevent that. Anyway, RFC 4122 states in section 6 (security): `Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example.` – Bertrand Mathieu Jul 22 '16 at 07:51
34

Python 3.6 makes most other answers here a bit out of date. Versions including 3.6 and beyond include the secrets module, which is designed for precisely this purpose.

If you need to generate a cryptographically secure string for any purpose on the web, refer to that module.

https://docs.python.org/3/library/secrets.html

Example:

import secrets

def make_token():
    """
    Creates a cryptographically-secure, URL-safe string
    """
    return secrets.token_urlsafe(16)  

In use:

>>> make_token()
'B31YOaQpb8Hxnxv1DXG6nA'
Adam Easterling
  • 2,266
  • 1
  • 21
  • 21
29

You can use the uuid library like so:

import uuid
my_id = uuid.uuid1() # or uuid.uuid4()
Sverre Rabbelier
  • 1,456
  • 2
  • 16
  • 22
  • @Gumbo: uuid will use the things like the mac address and uptime of your computer to come up with a random uuid, why is that not random? – Sverre Rabbelier May 10 '09 at 19:57
  • 8
    uuid1(), uuid4() and even uuid5() are not good sessions. See http://stackoverflow.com/questions/817882/unique-session-id-in-python/6092448#6092448 for a secure session ID example. – Sean May 23 '11 at 03:40
  • 2
    Wiki says that `Version 4 UUIDs use a scheme relying only on random numbers.`, how is it not good for a session token? uuid5 and uuid1 are not based on random numbers, but why is uuid4 bad then? – Buddy Jun 09 '12 at 18:46
  • 1
    UUID doesn't use a crypto secure random number generator, and is therefore unsuitable for generating secure session ids. The correct answer for Python 3.6+ is: https://stackoverflow.com/a/55661405 – Agost Biro Dec 04 '19 at 09:34
23
import os, base64
def generate_session():
    return base64.b64encode(os.urandom(16))
Seun Osewa
  • 4,965
  • 3
  • 29
  • 32
  • 2
    I duno, but this appears to be a valid solution. However, I'd advise you to strip the trailing "==" and also include a time stamp for less chance of a collision. – Unknown May 04 '09 at 23:34
  • 1
    The chance of a collision after 4 billion iterations is 1 in 8 billion. If I want to reduce the chance of a collision further I can just increase the number of bits i.e. os.urandom(32). And I don't understand what stripping the trailing "==" is supposed to achieve. – Seun Osewa Jun 07 '09 at 13:14
  • 1
    The trailing == can be removed to save space. All you have to do to decode it is to pad it back to the highest multiple of 4. Using urandom, it is possible to get very low entropy and end up with a duplicate. Using a timestamp is better. – Unknown Feb 18 '10 at 01:15
  • Really? (I don't see how a 4-byte timestamp is better than an additional 4 bytes of randomness, but) if what you say about low entropy is true, then I would go for an auto-incrementing session_id since it's possible for many session requests to be issued roughly at the same time. Even in a low entropy situation, I don't except urandom to ever return a duplicate 32-byte string. Pseudorandom algorithms may be attackable, but they don't return duplicate 32-byte sequences. – Seun Osewa Feb 18 '10 at 22:15
  • In a totally random system it is possible to return even a duplicate 32 byte string. In a low entropy system, this is even more likely. If you know that your computer can only spit out a session only so fast (ex. every 5 milliseconds) the minimum time it takes to execute that command, then this becomes a strict guarantee that you will never get a collision which is better than random. – Unknown Feb 19 '10 at 00:36
  • I think it's ridiculous to complicate your session generator because of something that's "possible" but probably hasn't EVER happened before. Like, say, our planet suddenly exploding. Yes, it would be "safer" to never leave your house without a mobile lightning rod because it's "possible" that you'll be struck by lightning, but it's also very ridiculous. Even if your computer can only spit out a session every 5ms, it's "possible" that two attempts to generate a session will be initiated concurrently on different processors in a multicore system. Do we need a processor id too? – Seun Osewa Feb 19 '10 at 13:33
  • 4
    I think it's best solution for my needs. I tried M2Crypto and PyCrypto but both present significant problems installing as well as running on windows. – Shwetanka Jul 23 '11 at 22:52
1

It can be as simple as creating a random number. Of course, you'd have to store your session IDs in a database or something and check each one you generate to make sure it's not a duplicate, but odds are it never will be if the numbers are large enough.

David Z
  • 128,184
  • 27
  • 255
  • 279
  • Exactly, Hence my solution: http://stackoverflow.com/questions/817882/unique-session-id-in-python/818040#818040 – Seun Osewa Jun 07 '09 at 13:15