2

I was reading a coding book, trying to learn more about Java, and came across this problem.

The question asks: "Given two strings, write a method to decide if one is a permutation of the other."

After mulling it over for a minute or so, I decided to go with a Hashmap solution. My logic was that add, remove, and search were all O(1), so that it'd be a fast solution. My code is as follows:

    public static boolean isPermutation(String a, String b) {
    if(a.length() != b.length()) {
        return false;
    }
    HashMap<Character, Integer> map = new HashMap<Character, Integer>();
    for(int x = 0; x < a.length(); x++) {
        char letter = a.charAt(x);
        if(!(map.containsKey(letter))) {
            map.put(letter, 1);
        }
        else {
            int val = map.get(letter) + 1;
            map.put(letter, val);
        }
    }
    for(int y = 0; y < b.length(); y++) {
        char letter = b.charAt(y);
        if(!(map.containsKey(letter))) {
            return false;
        }
        else {
            int val = map.remove(letter) - 1;
            if(val > 0) {
                map.put(letter, val);
            }
        }
    }
    return true;
}

The book, however, uses an array as its answer.

public boolean permutation(String s, String t) {
    if (s.length() != t.length()) {
        return false;
    }
    int[] letters = new int[256];
    char[] s_array = s.toCharArray();
    for (char c : s_array) {
        letters[c]++;
    }
    for (int i = 0; i < t.length(); i++) {
        int c = (int) t.charAt(i);
        if (--letters[c] < e) {
            return false;
        }
    }
    return true;
}

I have three questions.

First, I'd like to know whether my implementation is less efficient than the book's - and if so, what the inefficiencies are, and whether they can be corrected so that the Hashmap implementation is better (or at least equal) to the given array implementation.

Second, I understand that my Hashmap uses autoboxing to convert from Character to char. Is there a significant slowdown that comes with autoboxing?

And third, in my code I tried to avoid using the Hashmap's remove() function. My logic was that, while it should theoretically be O(1) to remove, using put() to instead replace the existing key with a new one (in this case, overwriting the old value) would be more efficient, since replacing would be less costly than removing, and then adding. Is my logic correct? Is it something that I should be doing?

Thank you very much!

  • "HashMap uses an array underneath so it can never be faster than using an array correctly": http://stackoverflow.com/questions/6462055/hashmap-vs-array-performance – paulsm4 Nov 01 '15 at 03:54
  • And yes, autoboxing does incur overhead. It cal also have several other pitfalls: https://effective-java.com/2010/05/the-advantages-and-traps-of-autoboxing/ – paulsm4 Nov 01 '15 at 03:58
  • Note that the book's answer will not work if the strings can include any Unicode character (including letters from alphabets other than the usual A-Z, a-z, and Western European alphabets), since it only allocates 256 for the array. To fix this, `letters` needs to be initialized to `new int[65536]`. In similar problems where the number of possible input elements is too large to use an array, a `HashMap` is the way to go. – ajb Nov 01 '15 at 05:24
  • @ajb that was part of my logic for using a Hashmap, I thought I'd save space on the string. Thank you! – user5511719 Nov 01 '15 at 06:19
  • There's a typo in the "book" example: `--letters[c] < e` should be `--letters[c] < 0`. – Stephen C Nov 01 '15 at 11:05

1 Answers1

3

First an observation: Big Oh notation is not a measure of performance. Rather, it is an indication of how an algorithm is going to scale as the variables (e.g. N) tend to infinity.

First, I'd like to know whether my implementation is less efficient than the book's ...

Benchmark them! Seriously, it is too difficult to say which approach will be faster simply by inspecting the code.

Your benchmarking needs to take account of that fact that the relative performance will depend on the inputs; e.g. measure with a range of different string lengths.

... and if so, what the inefficiencies are ...

This is what profiling is for. It will tell you how much time is spent in each method. And some profilers can measure to the level of the line number:

... and whether they can be corrected so that the Hashmap implementation is better (or at least equal) to the given array implementation.

That is for you to figure out ... once you have benchmarked and profiled.

Second, I understand that my Hashmap uses autoboxing to convert from Character to char. Is there a significant slowdown that comes with autoboxing?

There is definitely a slowdown. It should be measurable (or estimable). For example, if you profile your version you can see how much time is spent in Character methods.

Will it be significant? Hard to predict!

And third, in my code I tried to avoid using the Hashmap's remove() function. ... Is my logic correct?

Yes. But once again you can verify this by benchmarking and / or profiling.


Having said all of that, my >>educated guess<< is that your solution might work better for small strings, and the book solution will definitely work better for long enough strings. It comes down to two things:

  • The book solution preallocates an array of 256 ints ... 1024 bytes. That is likely to take longer that allocating an empty HashMap.

  • The per-character costs of autoboxing, map lookup and map insertion or update are likely to be significantly greater that the equivalent costs in the book solution.

Community
  • 1
  • 1
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216