Making an optimal pseudo random unique code generator for voucher-like behaviour

Question

I'm developing a simple management system for IT tech services. This system has a website where the clients can check the current state of their order by entering their unique code. This code, of course, has to be unique, and hopefully not easy to (manually) bruteforce.

I was thinking of an output somewhere on the lines of "XKF-042", easy to read and write down. The problem arises on the generation of these values: I could use plain random data and generate both pieces of the code, but that forces me to check wether the code already exists or not, which feels like an exponential waste of resources.

A simple answer would be to just begin counting from an arbitrary number, let's say "ABC-001", and add 1, so there is no real need to check if the value already exists. The problem with that is the ease of bruteforcing; anyone could just check ABC-XXX and check the last thousand consecutive orders.

Maths are not my forte, but I know there has to be a more elegant solution to this problem.

I'm thinking about generating every single possible permutation for each side of the code and scramble them, so I have a list of pairs to read from that's seemingly random, and maybe shift the "right side of the code" list every 1000 codes.

EDIT: It's not critical that the codes are impossible to guess; there won't be any personal info on the output besides the order info and the costs. I could use a 4x4 code (like "SJDM-4823") to make it "stronger".

Do you have a restriction on how long this resulting code should be? Obviously if someone has to type it in it probably shouldn't be too long. — Dan Breidegam, Sep 23 '19 at 19:43
Break down the problem into a) generate unique identifiers, b) generate easy to write down/spell identifiers, c) generate hard to guess identifiers. Each can be addressed with a bunch of methods, depending on what is the priority — Eriks Klotins, Sep 23 '19 at 19:47
Im using PHP for this, and there is no real restriction to the size other than usability, but starting with the 1 million codes sounds good enough — mauveine, Sep 23 '19 at 20:30

Peter O. · Answer 1 · 2020-01-19T05:49:52.550

1

An article I've written contains advice for unique random identifiers. From your question, it seems you have the following dilemma: Generate random unique identifiers that—

Are long enough to be hard to guess, but
are short enough to be easy to type by end users.

The advice in that article explains how unique random IDs should be generated (128 bits or longer, using a cryptographic RNG such as random_bytes together with bin2hex in PHP). But for your purposes, the resulting IDs may be too long to be suitable. There are ways to deal with such long IDs, including—

Dividing the ID into memorable chunks (for example: "374528294473" becomes "374-538-294-473"),
Converting the ID to a sequence of memorable words (as in Bitcoin's BIP39), or
Adding a so-called "checksum digit" at the end of the ID to guard against typing mistakes.

You should try (1) or (2) before deciding to use shorter IDs than what the advice in that article recommends.

Also, your application will generally have to check IDs for uniqueness against a database of IDs already generated; this uniqueness check, however, may or may not be a performance bottleneck for your application, and the only way to find out is to try it and see. Alternatively, you can include the ID table's record number (which should be unique for each record) into the ID itself.

There may also be a serious security issue if the order ID is the only thing that grants access to information about that order. Ideally, there should be other forms of authorization, such as allowing only logged-in users or certain logged-in users to access information about the order associated with an order ID. See also this question.

edited Jan 19 '20 at 05:49

answered Sep 23 '19 at 20:01

Peter O.

32,158
14
82
96

There is no serious bottleneck or "need" for the system to be utterly perfect on performance. The article, while an interesting read, seems to go into a different usage of rng. Chopping hashes still will probably make collissions, so it doesnt really fit. The BIP39 sounds interesting as in being easier to remember, but Im not sure about implementation. The system doesnt need to be complex, so if I had to check for something, I'd just check the ID. – mauveine Sep 23 '19 at 20:44
What do you mean about "chopping hashes"? When I wrote "dividing the ID into chunks" above, I meant dividing the ID into memorable chunks; for example: "374528294473" becomes "374-538-294-473" to ease the task of remembering that ID. Dividing the ID this way doesn't necessarily reduce ID collisions. – Peter O. Sep 23 '19 at 20:55
Thought so, but that brings us back to the usabilty part of the problem, since its a code thats meant to provide a bit of info IF the user decides to use it (and also, larger storage space for larger codes). Its a tricky question, thats why I got kinda interested in finding a clean solution. – mauveine Sep 23 '19 at 21:02
Well, there is one last thing: Authorization. More specifically, the order ID should not be the only thing that grants access to information about that order. Thus, you should allow only authorized users to enter any particular order ID. This will let you use shorter IDs while greatly reducing the problems of hard-to-guess IDs. – Peter O. Sep 23 '19 at 21:15
Consider using alphanumeric IDs. There are only 10 digits but there are 26 numbers. If each character of an ID has 36 possibilities (10 digits plus 26 numbers) then you can get a much bigger number space for a given ID length. – S. Imp Sep 23 '19 at 21:32

score 1 · Accepted Answer · answered Sep 25 '19 at 00:47

Well, I would propose to use LCG, Linear Congruential Generator. It has very nice properties - given right set of constants (Hull–Dobell Theorem) output uniquely covers all 2³² space (or 64bit space, but as far as I remember PHP has only 32bit integers). Basically it is 1-to-1 mapper from [0...2³²) interval to another [0...2³²) interval. It could be used in two ways

One, ID_next = LCG(ID_prev), like typical random numbers generator. Or just feed it from linearly incrementing counter, ID_next = LCG(1, 2, ...). You could convert integer to 8 symbols base-16 string, should be good enough.

Don't have PHP code, have a python one

import numpy as np

class LCG:

    UZERO: np.uint32 = np.uint32(0)
    UONE : np.uint32 = np.uint32(1)

    def __init__(self, seed: np.uint32, a: np.uint32, c: np.uint32) -> None:
        self._seed: np.uint32 = np.uint32(seed)
        self._a   : np.uint32 = np.uint32(a)
        self._c   : np.uint32 = np.uint32(c)

    def next(self) -> np.uint32:
        self._seed = self._a * self._seed + self._c
        return self._seed

    def seed(self) -> np.uint32:
        return self._seed

    def set_seed(self, seed: np.uint32) -> np.uint32:
        self._seed = seed

    def skip(self, ns: np.int32) -> None:
        """
        Signed argument - skip forward as well as backward

        The algorithm here to determine the parameters used to skip ahead is
        described in the paper F. Brown, "Random Number Generation with Arbitrary Stride,"
        Trans. Am. Nucl. Soc. (Nov. 1994). This algorithm is able to skip ahead in
        O(log2(N)) operations instead of O(N). It computes parameters
        A and C which can then be used to find x_N = A*x_0 + C mod 2^M.
        """

        nskip: np.uint32 = np.uint32(ns)

        a: np.uint32 = self._a
        c: np.uint32 = self._c

        a_next: np.uint32 = LCG.UONE
        c_next: np.uint32 = LCG.UZERO

        while nskip > LCG.UZERO:
            if (nskip & LCG.UONE) != LCG.UZERO:
                a_next = a_next * a
                c_next = c_next * a + c

            c = (a + LCG.UONE) * c
            a = a * a

            nskip = nskip >> LCG.UONE

        self._seed = a_next * self._seed + c_next


#%%
np.seterr(over='ignore')

a = np.uint32(1664525)
c = np.uint32(1013904223)
seed = np.uint32(1)

rng = LCG(seed, a, c)

print(rng.next())
print(rng.next())
print(rng.next())

Making an optimal pseudo random unique code generator for voucher-like behaviour

2 Answers2