-4

I want to generate unique alphanumeric string of length 28 from two unique alphanumeric strings. Is it mathematically possible to have collision free string from two unique strings?

here is what I did,

ASCII_NUMBER_RANGE_START = 48;
ASCII_ALPHABET_RANGE_START =55;

 for (int i = 0; i < firstArray.length; i++) {
        int tempASCIIValue = (Character.getNumericValue(firstArray[i]) + Character.getNumericValue(secondArray[i])) % 35;
        if (tempASCIIValue <= 9) {
            FINAL_ASCII_VALUE = tempASCIIValue + ASCII_NUMBER_RANGE_START;
        } else {
            FINAL_ASCII_VALUE = tempASCIIValue + ASCII_ALPHABET_RANGE_START;
        }
        combinedArray[i] = (char) FINAL_ASCII_VALUE;
    }
    return new String(combinedArray);
}

In above code, i am not sure whether the resultant string is as uniquely strong as its parent strings.

Note: the generated string to have same length as the parent string

Any help is appreciated. Thanks.

mrtpk
  • 1,398
  • 4
  • 18
  • 38
  • What you have tried ? – Amy Sep 07 '16 at 05:38
  • Let's say, concatenation (unique A, unique B) will always be unique string literal. – mrtpk Sep 07 '16 at 06:15
  • try this link - http://www.javapractices.com/topic/TopicAction.do?Id=56 – mrtpk Sep 07 '16 at 06:21
  • Do you have restrictions on the allowable characters in the input or output? – Salix alba Sep 07 '16 at 09:19
  • @Salixalba only that input and output should be alphanumeric, i am just concerned that my code will reduce the probability of uniqueness of the resultant string, i hope you understood. – Justin Babu Sep 07 '16 at 12:12
  • There is a basic problem in that you have 62 possible character which means there are P=62^28 possible values for each input and output. So the total possible inputs are P * P possible inputs and P possible outputs. Whatever algorithm you choose there will be two pairs of inputs yielding the same output. – Salix alba Sep 07 '16 at 13:43
  • @Salixalba, just for clarification, Are you saying that there will be **at most** two pairs of inputs yielding same output like `A` `op` `B` and `B` `op` `A` will have same outputs. – mrtpk Sep 07 '16 at 14:05
  • Consider the case of two character strings. There are 65^2 = 4225 possibilities for each string. So 17850625 possible inputs and only 4225 possible outputs. There will be a lot of possible collision. Different algorithms will give different types of collision. – Salix alba Sep 07 '16 at 14:21
  • Hard to answer without knowing what you mean by "unique." No overlap on the set of characters contained by the two strings? The strings can have overlapping subsets of characters but are not identical? All the characters in the input strings and the output must be unique? The character sets are the same, but the orderings must be different? Each result must be one you've never seen before? Also, why did you include the `random` tag--what's random in the code fragment you provided? – pjs Sep 07 '16 at 14:30
  • @pjs i have two randomly generated unique strings (uuid) say A and B , of length 28; I want to combine A and B. like C = A + B. C should also be of length 28 and unique like his parents A and B. – Justin Babu Sep 07 '16 at 18:20
  • @Salixalba I just want to know how bad the collision will be. Thank u for your valuable time. – Justin Babu Sep 07 '16 at 18:26

2 Answers2

1

Given that collisions are unavoidable. We can look at ideas like hash code generation. In a hashtable, you want to generate a hash code for each object. Ideally you want a Perfect hash function, but thats quite tricky to implement.

You might be able to get away with a hash function see for example Best implementation for hashCode method. A simple one with two integer variables is

int generateHashCode(int a,int b) {
    // Start with a non-zero constant. Prime is preferred
    int result = 17;
    // For each field multiply the previous result by a prime and add
    result = 31 * result + a;         
    result = 31 * result + b;         
    return result;
}

For your implementation you could change this to work with character. If you are happy to lose one character giving 26 + 26 + 9 = 64 possibilities for each character. This means you can use 6 bits for each character and 168 bits for the whole input, that can be fit into 6 integer. Then just run the generateHashCode() method on each pair of integers.

Community
  • 1
  • 1
Salix alba
  • 7,536
  • 2
  • 32
  • 38
0

You can add both of your string uids (28 digit alphanumeric) using StringBuilder/StringBuffer and put the concatinated String into any implementation of Set.The set implementation will filter out the duplicate elements if any.

Here is the sample Code :

import java.util.LinkedHashSet;

public class Delete1 {
    public static void main(String[] args) {
        LinkedHashSet<String> impl=new LinkedHashSet<String>();
        for (int i = 0; i < 5; i++) {
            String uid1="qwertyuiop1234567890~!@#$%^&";
            String uid2="qwertyuiop1234567890~!@#$%^&";
            StringBuilder builder=new StringBuilder();
            builder.append(uid1);
            builder.append(uid2);
            impl.add(builder.toString());
        }
        for (String value : impl) {
            System.out.println(value);
        }
    }
}

Though the loop is iterating 5 times but the output is

qwertyuiop1234567890~!@#$%^&qwertyuiop1234567890~!@#$%^&

You can add your loop variable to create unique id.

Sayak Choudhury
  • 167
  • 1
  • 1
  • 13