3

Consider this example of finding anagrams

aabc
abca

They both are anagrams, I am looking for a way so that their hash generated by characters is same and unique.

The uniqueness is important so that no two different string aabc and xyaq generate the same hash

I have no idea about this, but throwing up here to learn what I need to look up for

daydreamer
  • 87,243
  • 191
  • 450
  • 722
  • 1
    What you're seeking is a perfect hash function (assuming that `"aabc".equals("abca")` in a finite `Set` and together represent a distinct element in the `Set`.). Your problem definition needs to be finite for anyone to propose a solution. – Deepak Bala Feb 17 '15 at 16:09
  • 1
    Hashes are essentially never going to be _unique._ Either work out a way that that is acceptable, or find some non-hashing approach. – Louis Wasserman Feb 17 '15 at 17:51
  • 1
    [this is an idea](http://stackoverflow.com/a/18798966/4316015) – Neha Agrawal Feb 18 '15 at 10:44

1 Answers1

3

Sort the characters in the string and use that as the hash. Strings will have the same hash if they are anagrams of each other:

String anagramHash(String str) {
    char[] chars = str.toCharArray();
    Arrays.sort(chars);
    return new String(chars);
}

This probably won't work if you have code points which aren't on the BMP (http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html).

Alternatively, generate a histogram and use that as the hash.

Adrian Leonhard
  • 7,040
  • 2
  • 24
  • 38