-1

actually I am stuck with the following business case and do not have an idea how to solve it.

I have to create more than 5.000.000 unique alphanumeric codes.

The rules for the codes are:

length: 12
format: every 4 digits "-"
some letters should be excluded like: O or l

The codes should be "secure" (i.e totally random) and it should be possible to run the script multiple times in case the codes aren't enough and we have to create more codes.

e.g. ab4D-406a-BCh7-TEs3

I have to solve this in Python 3.

My first idea was to save the codes into an database and just create them with the random function ASCII-Code -> Letter but maybe the scirpt creates the same code twice so I habe to check every time if that code already exists in the database which will cause a lot of database traffic.

My second idea is to use a hash function, but I think the codes wouldn't be secure and there are no hashfunctions which pass my rules.

My third idea is to use somethink like a random module from python to create the code and write the codes into an file and check the file every time if the code is already inside. But that's also not good for performance but I think betther than using a database.

Anybody an idea how to solve that problem with high performance?

Greetings.

Edit:

I tried this but it takes hours to create the codes. Some tipps how to increase the performance?

import random

sequence = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
seq = list(sequence)


codelist = []
counter = 0
while len(codelist) < 5000000:
    code = ""
    counter = counter +1
    print(counter)
    while len(code) < 12:
        code = code + str(random.choice(seq))

    try:
        codelist.index(code)
    except ValueError:
        codelist.append(code)


file = open('codefile.txt','w')
for item in codelist:
    file.write("%s\n" % item)
Limon
  • 63
  • 8
  • 1
    You say some characters need to be excluded? Let's say 50 characters remain. That makes for 244x10^18 possibilities. Whats holding you back to create them random at once, dump them and checking for non-unique entries afterwards? Chances of doubles seem quite small... if you find doubles, delete them and generate a new ones for those deleted. Ugly, perhaps, but once it is in a DB everything should be fast enough... – Kraay89 Jun 15 '17 at 10:35
  • Possible duplicate of [Random string generation with upper case letters and digits in Python](https://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits-in-python) – Peter O. Jun 15 '17 at 13:32
  • I don't understand. `ab4D-406a-BCh7-TEs3` has 16 digits excluding `-`, but you said you wanted a length of 12. – Artjom B. Jun 15 '17 at 14:01
  • You should bear in mind that if you generate unlimited codes consisting of four-letter words, you'll be entering a monkeys-with-typewriters scenario. Some of your codes may resemble Shakespeare, but others could end up being very vulgar and/or offensive. – r3mainer Jun 15 '17 at 23:36
  • Didn't think about that. Great comment thanks! – Limon Jun 18 '17 at 11:36

3 Answers3

1

Encryption guarantees uniqueness. If you encrypt the numbers 0, 1, 2, ... 5,000,000 you will get 5,000,001 guaranteed unique results providing you do not change the key.

Your next problem is how to change the resulting binary number into your desired format. Full alphanumeric uses 26 + 26 + 10 = 62 characters. You are using a subset of that, so you will be using fewer characters, say 58 characters as an example. That means you can treat your output as a 12-digit base 58 (or whatever) number.

12 digits in base 58 (or whatever) will allow you to size the binary block you encrypt. Look at Format Preserving Encryption to ensure that the output of your encryption is sized correctly for your requirements.

rossum
  • 15,344
  • 1
  • 24
  • 38
0

The easiest way to generate unique alphanumeric codes is to generate a uuid, but they dont match your 'rules' - they are longer:

>>> import uuid
>>> _id = uuid.uuid4()
>>> print (_id)
5d9efd48-661f-47f8-8886-13e93fd8b899
>>> print (len(str(_id)))
36
>>> 
Maurice Meyer
  • 17,279
  • 4
  • 30
  • 47
0
from threading import Thread

UUIDs = []
for i in range(100):
    t = Thread(target= generate_alphanum, args=(UUIDs,))
    t.start()
def generate_alphanum(g_list):
    while len(g_list) < 50000:
        uid = ''.join(random.choice(string.ascii_letters + string.digits[2:]) for _ in range(12))
        if uid not in g_list:
            g_list(uid[:4] + '-' + uid[4:8] + '-' + uid[8:])

caution: this may not guarantee complete randomness, but gets the job done. sample output:

'FD58-KGIo-yBGL',
 'q9jv-tDa4-K3ae',
 'BrGr-AO9o-GkfN',
 'VyKb-NHh2-HRHM',
 'g3Eu-aPsv-2YgF',
 'iPxB-p4GV-f5tM',
 'jewn-NWnM-kUDw',
 'gDWY-MZB4-OysT',
 'Acbu-kpTG-TCMm',
 'rHBz-yJca-s9aA',
 '2nnH-WFgT-gQef',
 '2qSz-kX8z-qDpi',
 'FnjV-sgzj-gzWt',
 '5uwW-jwM5-FxB6',
Yonas Kassa
  • 3,362
  • 1
  • 18
  • 27