2

I have a program which I found online which basically tells whether the String contains all unique characters, below is the code

private static boolean areCharsUnique(String str) {
        if (str.length() > 256)
            return false;
        int checker = 0;
        for (int i = 0; i < str.length(); i++) {
            int val = str.charAt(i) - 'a';
            if ((checker & (1 << val)) > 0) {
                return false;
            }
            checker |= (1 << val);
        }
        return true;
    }

I am baffled by this line of code if ((checker & (1 << val)) > 0) and also

checker |= (1 << val);

I know that << is a left shift operator, but how exactly left shifting helps in the above situation?

In short how does the above program work?

Arif Nadeem
  • 8,524
  • 7
  • 47
  • 78

4 Answers4

7

This code works under the assumption that ASCII character set has consecutive character in its mapping so that a == 97, b == 98, and so on.

Starting from this you can calculate a delta distance from the a character, eg 'e' - 'a' = 5. This distance is used to set (through checker |= (1 << val)) a bit in a integer number (which has 32 bits).

So if a 'e' character is found then bit at index 5 is set to 1 for checker.

This is done for every char by ensuring that you never find a bit that has been already set previously (through if (checker & (1 << val)) > 0)).

This works only for lowercase characters (even because int has 32 bits). An HashSet<Character> would be surely better.

Jack
  • 131,802
  • 30
  • 241
  • 343
2

Each character is converted to a numeric index from 0 (== 'a') to 26 (== 'z') and beyond. For each of those indexes, the corresponding bit in an integer value (== 'checker') is set. If the bit for that index is already set, you can determine that that character already was contained in the given string.

Note that this algorithm only works for lowercase strings, as for the uppercase characters, the value will overflow and give unreliable results. One fix for this would be to convert the 'checker' type from int to long.

jawi
  • 360
  • 1
  • 7
2

We convert each character to a numeric index from 0 ('a') to 25 ('z') here

 int val = str.charAt(i) - 'a';

Here we look at binary presentation of checker for example, if checker = 001 this means we already have char 'a' counted, because first bit is set to 1.

In this row we check if bit related to current letter is set. 1<<val represent 00100..00, where 1 is set in val-th position from left. Others digits are zeros. About binary operation you may read at TopCoder

if ((checker & (1 << val)) > 0) {...}

This line checker |= (1 << val); set bit at position val from left to 1 in checker.

EDIT: Here you may assume checker like bool arr[32],

and

int val = str.charAt(i) - 'a'; the same as if(arr[str.charAt(i) - 'a'] == true) ...

and checker |= (1 << val); is equal to arr[val] = true.

So, at begging you have all elements of arr set to zeros. val - is integer representation of letters from 'a' to 'z'.

ivan.mylyanyk
  • 2,051
  • 4
  • 30
  • 37
1

This code store the presence of each lowercase character as a single bit in a 32-bit integer.

1 << val computes a number with only that bit set.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • I tried debugging that piece of code, the value of both checker and val change after every i'th step, so where exactly is the presence of lowercase character stored? – Arif Nadeem Jan 14 '13 at 19:16
  • 1
    It's stored as a bitmap, where each bit represents a character offset from ASCII 'a'. So, rather than the literal character "b", being stored, the program sets the 2nd bit of @checker@ to 1. Think of it as similar to a substitution cipher, where a=0, b=1, c=2, etc. As others have pointed out, though, this solution is unworkable for anything other than a limited range of ASCII lowercase characters. – Palpatim Jan 14 '13 at 19:20
  • yes I am aware of that, thank you very much for this vivid explanation. – Arif Nadeem Jan 14 '13 at 19:23