4

I work for a banking project and their requirement is to generate unique transaction reference for each transaction. The format for UTR is:

<BankCode><YYDDD><5 digit SequenceId>.

This 5 digit sequence ID can be alphanumeric as well. The transaction count each day can go up to 100-200K.

If I use an Oracle sequence then I can have only 10K values.

I tried to use SecureRandom generator and generated 200K 5 length string but it generated around 30 duplicate strings.

Below is the code snippet I used

int leftLimit = 48;
int rightLimit = 122;
int i1=0;
Random random = new SecureRandom();
while (i1<200000) {
    String generatedString = random.ints(leftLimit, rightLimit+1)
                                   .filter(i -> (i<=57||i>=65) && ( i<=90|| i>=97))
                                   .limit(5)
                                   .collect(StringBuilder::new,
                                            StringBuilder::appendCodePoint,
                                            StringBuilder::append)
                                   .toString();
    System.out.println(generatedString);
    i1++;
}
Abra
  • 19,142
  • 7
  • 29
  • 41
  • 5
    *"This 5 digit seqid can be alphanumeric"* --- What does that mean? Each position can be `0-9` or `A-Z`? Is `a-z` allowed too and considered different from `A-Z`? Any other characters allowed? --- If only `0-9` and `A-Z` is allowed, that would be 36 distinct characters, which means a Base-36 number. With 5 "digits" in Base-36, you can have 36^5 = 60466176 different values. --- *FYI:* The Java `Integer` class can format Base-36 numbers. – Andreas Sep 26 '20 at 06:48
  • Count in base 36 instead of using random numbers. – tgdavies Sep 26 '20 at 06:52
  • Hi Andreas, Thanks for the response. position can be 0-9 or A-Z. No special characters allowed in generated reference – user5740099 Sep 26 '20 at 06:56
  • 1
    Well... the result is expected. This is a variation of the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem). In total, you have `36^5` possible passwords. The probability of pulling 100k passwords without duplicates is [`((36!) / (36! - 100000)!) / ((36^5)^100000) ~ 1.70 * 10^-36`](https://www.wolframalpha.com/input/?i=%28%2836%5E5%29%21+%2F+%2836%5E5+-+100000%29%21%29+%2F+%28%2836%5E5%29%5E100000%29), aka. very unlikely. – Turing85 Sep 26 '20 at 07:07
  • 3
    Why use random if you want a sequence? – Joakim Danielson Sep 26 '20 at 07:27
  • my [formula](https://stackoverflow.com/questions/64074677/how-to-generate-a-5-character-unique-alphanumeric-value-in-java#comment113306016_64074677) has a typo, it is not `((36!) / (36! - 10000)!) ...` but `((36^5)! / (36^5 - 100000)!)`. The formula on wolframalpha, however, is correct. – Turing85 Sep 26 '20 at 07:31
  • Hi Joakim, <5 digit SeqId> means i can change only these 5 digit values in my transaction reference. Its not necessarily to be in sequential order but generated UTR must be unique. e.g. NBOB20270X1ZFC where NBOB is bank code, 20270 is Julian date and X1ZFC is 5 char alphanumeric string. So for a day i want to generate last 5 char as unique string always. – user5740099 Sep 26 '20 at 07:41
  • 1
    Yes but isn't a sequence the easiest way to make sure each value is unique? Unless you have other requirements of course. – Joakim Danielson Sep 26 '20 at 07:51
  • If you have a specific sequence, there's always a possibility that somebody could work out the sequence, and generate valid values. I suspect that the only way to avoid any risk of repetition is to store the values already issued (or hashes of them) and then don't issue them again. Of increase the number of character positions until the risk of duplicates is acceptably small. – Kevin Boone Sep 26 '20 at 08:12

3 Answers3

0

If you want a pseudo-random sequence, I suggest you use a custom Feistel implementation. Feistel is designed to be a reciprocal mechanism, so you can decode Feistel by reapplying it, meaning that i == feistel(feistel(i)) and if you go from 1 to X you will get all numbers between 1 and X exactly once. So no collision.

Basically, you have 36 characters at your disposal. So you have 60,466,176 possible values, but you want only 200,000 of them. But actually, we don't care how many you want because Feistel ensures that there are NO collisions.

You'll notice that 60,466,176 in binary is 0b11100110101010010000000000, that's a 26 bits number. 26 isn't very friendly for the code so we'll wrap our custom feistel mapper to 24 bits instead. Feistel having to split a number in two even parts, each part will be 12 bits. This is only to explain the values you'll see in the code below, which is the 12 instead of 16 if you look at other implementations. Also, the 0xFFF is the mask for 12 bits.

Now the algorithm itself:

  static int feistel24bits(int value) {
    int l1 = (value >> 12) & 0xFFF;
    int r1 = value & 0xFFF;
    for (int i = 0; i < 3; i++) {
      int key = (int)((((1366 * r1 + 150889) % 714025) / 714025d) * 0xFFF);
      int l2 = r1;
      int r2 = l1 ^ key;
      l1 = l2;
      r1 = r2;
    }
    return (r1 << 12) | l1;
  } 

So basically, this means that if you give this algorithm any number between 0 and 16777215 ( = 224-1), you'll get a unique, pseudo-random number that could fit in a 5 character string when written in base-36.

So how do you get it all working? Well, it's very simple:

String nextId() {
  int sequence = (retrieveSequence() + 1) & 0xFFFFFF; // bound to 24 bits
  int nextId = feistel24bits(sequence);
  storeSequence(sequence);
  return intIdToString(nextId);
}
static String intIdToString(int id) {
  String str = Integer.toString(id, 36);
  while(str.length() < 5) { str = "0" + str; }
  return str;
}

Here's the full code that I used.

Olivier Grégoire
  • 33,839
  • 23
  • 96
  • 137
-1

There seem to be 2 approaches:

  1. Store unique values into a Set of required size and remove its elements upon use.
class UniqueIdGenerator {
    private static final int CODE_LENGTH = 5;
    private static final int RANGE = (int) Math.pow(36, CODE_LENGTH); 
    private final Random random = new SecureRandom();
    private final int initSize;
    private final Set<String> memo = new HashSet<>();
    
    public UniqueIdGenerator(int size) {
        this.initSize = size;
        generate();
    }
    
    private void generate() {
        int dups = 0;
        while (memo.size() < initSize) {
            String code = Formatter.padZeros(Integer.toString(random.nextInt(RANGE), 36), CODE_LENGTH);
            
            if (memo.contains(code)) {
                dups++;
            } else {
                memo.add(code);
            }
        }
        System.out.println("Duplicates occurred: " + dups);
    }
    
    public String getNext() {
        String code = memo.iterator().next();
        memo.remove(code);
        return code;
    }   
}
  1. Use a sequence with random start and random increments.
class RandomSequencer {
    private static final int CODE_LENGTH = 5;
    private Random random = new SecureRandom();
    private int start = random.nextInt(100_000);
    
    public String getNext() {
        String code = Formatter.padZeros(Integer.toString(start, 36), CODE_LENGTH);
        start += random.nextInt(300) + 1;
        
        return code;
    }
}

Update Padding of zeros may be implemented in a variety of ways:

class Formatter {
    private static String[] pads = {"", "0", "00", "000", "0000"};
    public static String padZeros(String str, int maxLength) {
        if (str.length() >= maxLength) {
            return str;
        }
        return pads[maxLength - str.length()] + str;
    }

    private static final String ZEROS = "0000";
    public static String padZeros2(String str, int maxLength) {
        if (str.length() >= maxLength) {
            return str;
        }
        return ZEROS.substring(0, maxLength - str.length()) + str;
    }

    public static String padZeros3(String str, int maxLength) {
        if (str.length() >= maxLength) {
            return str;
        }
        return String.format("%1$" + maxLength + "s", str).replace(" ", "0");
    }
}
Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
-1

Since you mentioned Oracle in your question, would you consider a PL/SQL solution?

  1. Create a database table to hold your sequence IDs.
create table UTR (
  BANK_CODE     number(4)
 ,TXN_DATE_STR  char(5)
 ,SEQUENCE_ID   char(5)
 ,USED_FLAG     char(1)
 ,constraint USED_FLAG_VALID check (USED_FLAG in ('N', 'Y'))
 ,constraint UTR_PK primary key (BANK_CODE, TXN_DATE_STR, SEQUENCE_ID)
)
  1. Create a PL/SQL procedure to populate the table.
create or replace procedure POPULATE_UTR
is
  L_COUNT     number(6);
  L_BANK      number(4);
  L_DATE_STR  char(5);
  L_SEQUENCE  varchar2(5);
begin
  L_BANK := 3210;
  select to_char(sysdate, 'YYDDD')
    into L_DATE_STR
    from DUAL;
  L_COUNT := 0;
  while L_COUNT < 200000 loop
    L_SEQUENCE := '';
    for K in 1..5 loop
      L_SEQUENCE := L_SEQUENCE || substr('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789',
                                         mod(abs(dbms_random.random), 62) + 1,
                                         1);
    end loop;
    begin
      insert into UTR values (L_BANK, L_DATE_STR, L_SEQUENCE, 'N');
      L_COUNT := L_COUNT + 1;
    exception
      when dup_val_on_index then
        null; -- ignore.
    end;
  end loop;
end;

Note that the code for generating the sequence ID came from this SO question with the title Generate Upper and Lowercase Alphanumeric Random String in Oracle

Also, the code for obtaining the date string came from this SO question entitled Oracle Julian day of year

Now, since each row in database table UTR contains a unique sequence ID, you can select the first row where USED_FLAG equals N

select SEQUENCE_ID
  from UTR
 where BANK_CODE = 1234 -- i.e. whatever the relevant bank code is
   and TXN_DATE_STR = 'whatever is relevant'
   and USED_FLAG = 'N'
   and rownum < 2

For your information, if you want to select a random row from table UTR instead, refer to this SO question entitled How to get records randomly from the oracle database?

After you use that sequence ID, you update the table and set the USED_FLAG to Y, i.e.

update UTR
   set USED_FLAG = 'Y'
 where BANK_CODE = 1234 -- what you used in the select
   and TXN_DATE_STR = 'what you used in the select'
   and SEQUENCE_ID = 'what was returned by the select'
Abra
  • 19,142
  • 7
  • 29
  • 41