1

I am trying to see how I can create a set of unique IDs of a fixed length (say length 12) in python which uses a specific subset of all alphanumeric characters. The usecase here is that these IDs need to be read by people and referred to in printed documents, and so I am trying to avoid using characters L, I, O and numbers 0, 1. I of course need to be able to generate a new ID as needed.

I looked into the UUID function in other answers but wasn't able to find a way to use that function to meet my requirements. I've done a lot of searching, but apologies if this is duplicate.

Edit: So far I tried using UUID as described here. And also the hashids function. But could not figure out a way to do it using them. The next best solution I could come up with is create a list of random strings, and check against all existing ids. But that seems woefully inefficient.

martineau
  • 119,623
  • 25
  • 170
  • 301
ste_kwr
  • 820
  • 1
  • 5
  • 21
  • 2
    Can you show us what you have tried so far ? – sushanth Mar 09 '21 at 17:51
  • Not solving your problem, but I believe there are special font types to avoid confusion between characters. That could make your problem a little easier. – pelelter Mar 09 '21 at 17:53
  • 1
    There are only two ways to ensure uniqueness. First is to make the ID huge and incorporate elements that are known to be unique; that's how UUIDs work. Second is to have a database that only issues unique numbers, but everybody will be tied to that database. – Mark Ransom Mar 09 '21 at 18:25

3 Answers3

4

For a set of characters to sample you could use string.ascii_uppercase (A-Z) plus string.digits (0-9), but then remove unwanted characters 'LIO01'. From there you can use random.choices to generate a sequence of length k while allowing repeated characters.

import string
import random
def unique_id(size):
    chars = list(set(string.ascii_uppercase + string.digits).difference('LIO01'))
    return ''.join(random.choices(chars, k=size))

>>> unique_id(12)
'HBFXXHWZ8349'
>>> unique_id(12)
'A7W5WK636BYN'
>>> unique_id(12)
'WJ2JBX924NVK'
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • Where is the uniqueness? While the chances are small for it being 12 characters, this can generate the same id twice... – Tomerikoo Mar 09 '21 at 18:09
  • Might be best to add some sort of validation to make sure that the id you are creating doesn't return one that already exists (for example, if you are running the script once every time) to avoid issues with clashing ids. – Jack Moody Mar 09 '21 at 18:11
  • 1
    I don't know where @ste_kwr will persist these IDs, I assume some kind of database, so would need those kind of details to implement a uniqueness checker. From a high level, it would just be looping over generated id's until a unique one was generated. – Cory Kramer Mar 09 '21 at 18:35
2

You could use an iterator like itertools.combinations

import itertools
import string

valid_chars = set(string.ascii_lowercase + string.digits) - set('lio01')

# Probably would want to persist the used values by using some sort of database/file
# instead of this
used = set()

unique_id_generator = itertools.combinations(valid_chars, 12)

generated = "".join(next(unique_id_generator))
while generated in used:
    generated = "".join(next(unique_id_generator))

# Once an unused value has been found, add it to used list (or some sort of database where you can keep track)
used.add(generated)

This generator will continue to produce all possible combinations (without replacement) of all ascii lower case characters and digits excluding the ones you mentioned. If you need this upper case, you can use .upper() and if you want to allow replacement, you can use itertools.combinations_with_replacement.

If 'xyz' is not considered to be the same as 'xzy', take a look at itertools.permutations.

Jack Moody
  • 1,590
  • 3
  • 21
  • 38
  • Mainly, I was wondering if there was a way to generate unique values without checking against the previous list (which should be theoretically possible). But it appears from the answers that python does not afford such functionality. Though this helped me find a solution to what I was looking for. What I needed was the function `itertools.product()`. – ste_kwr Mar 13 '21 at 19:26
1

I bumped to a similar problem and the simplest solution I could think of is this one:

Answer

from secrets import token_urlsafe
id = ''.join([c for c in token_urlsafe(10) if c not in '-_OI0l'])[:5]

print(id) # 'a3HkR'

Explanation

token_urlsafe(10) String with 10 random chars from [a-z, A-Z, 0-9, -, _]
if c not in '-_OI0l' remove characters you don't want
[:5] Take just 5 from the beginning, if you want 5 for example.

Strengths

  • Readable
  • One-liner
  • Customizable
  • Can be highly secure if needed

Limitations

  • You can check the uniqueness in other ways, or just pick as long an id as needed so that randomness takes care of that for you.

    • The above example can create 459 165 024 different ids.
  • If you remove many characters or you want more characters you have to make the number in token_urlsafe(number) also bigger to not run into an IndexError.

Mikko Haavisto
  • 991
  • 7
  • 10