-1

I wrote up a hash function using the folding method so that anagrams i.e. "ape" &"pea" would hash to the same value. So far most strings I put into it work. But on some occasions I get Number Format Exceptions.

For instance when I pass the string "abalone" with a table size of 109 the exception pops up while the string "abalon" does not.

private static int Hash(String theString,int theTableSize){
    //ignore case and remove all non-alphanumeric characters
    String temp = theString.toLowerCase();
    temp = temp.replaceAll("[^a-zA-Z0-9]", "");

    //sort to count # and type of characters when hashing, NOT alphabetical order
    char[] arr = temp.toCharArray();
    Arrays.sort(arr);
    temp = new String(arr);

    //Folding Method for Hash
    String str_A = temp.substring(0, temp.length()/2);
    String str_B = temp.substring(temp.length()/2, temp.length());

    System.out.println(str_A + " " + str_B );

    return (folding(str_A) + folding(str_B)) % theTableSize;
}

private static int folding(String substring){
    int x = 0;
    for(int i = 0; i < substring.length(); i++){
        int tchar = substring.charAt(i);
        String schar = Integer.toString(tchar);
        System.out.println(schar);
        x = Integer.parseInt(x + schar) ;
        x = Math.abs(x);
    }
    return x;
}

Is there something that I am missing?

s d
  • 301
  • 2
  • 5
  • 13

2 Answers2

2

The problem seems to be the line

x = Integer.parseInt(x + schar);

You're concatenating strings here, so the argument x + schar could well be longer than the maximum size of an int.

Dawood ibn Kareem
  • 77,785
  • 15
  • 98
  • 110
  • OK, what am I missing? The `Hash` function sorts the characters fairly early on. How is that _not_ going to work for anagrams? – Solomon Slow Dec 05 '16 at 03:29
  • So use a long instead? As for anagrams my line of thought was to make anagrams homogenized to just what alphanumeric character's present so 'bacc' and 'acC b' would both come out as 'abcc' then hashed. – s d Dec 05 '16 at 03:30
  • Oh, I missed that bit. I will delete the last sentence above - the bit that talks about the anagrams. Sorry. – Dawood ibn Kareem Dec 05 '16 at 03:31
0

A java int is 32 bits. the number you try to parse is 979897108111, which is over 32 bits.

Roberto Attias
  • 1,883
  • 1
  • 11
  • 21