-1

How would I best implement a Java function which accepts a String parameter and returns true or false depending on whether the String is comprised of all different characters or not.

This is an interview question. The implementation should use bit masking to store the occurrence of characters and no other auxiliary data structures.

Note: the requirement to use bit masking makes this question different from:

As requested, here's my effort so far. Please re-open the question if possible now:

public static boolean hasAllUniqueCharactersUsingBitMasking(String string) {
  final int length = string.length();
  final int nBitsToStoreAllUnicodeCharacters =
      (int) Math.ceil(Math.log(Character.MAX_CODE_POINT) / Math.log(2d));
  BitSet bitSet = new BitSet(nBitsToStoreAllUnicodeCharacters);
  for (int i = 0; i < length; i++) {
    char c = string.charAt(i);
    if (bitSet.get(c)) {
      return false;
    }
    bitSet.set(c);
  }
  return true;
}
Community
  • 1
  • 1
Robottinosino
  • 10,384
  • 17
  • 59
  • 97
  • This is an answer to an interview question. – Robottinosino Aug 15 '12 at 19:27
  • 1
    Forgive me, but what is the question? – Brian J Aug 15 '12 at 19:29
  • You can add all the characters to a BitSet and you can check the size matches the length or if any of the bits are already set. – Peter Lawrey Aug 15 '12 at 19:38
  • I rephrased the question. Reopen? – Robottinosino Aug 15 '12 at 19:40
  • Robottinosino, this kind of question is not a great fit for SO, where people come with their code and ask for help with specific problems they have. You are just rephrasing an interview question and asking for a full-blown solution, while providing no evidence of research on your part. – Marko Topolnik Aug 15 '12 at 19:43
  • Let me put in my own implementation then, showing the research I have done on my part. – Robottinosino Aug 15 '12 at 19:43
  • 1
    You've got my vote. I don't see why the complex calculation for the size of the bitset---a simple `new BitSet()` would have been just fine. – Marko Topolnik Aug 15 '12 at 19:49
  • and mine too...Good edit. Always show code. We are programmers after all :P – David Kroukamp Aug 15 '12 at 19:49
  • If `BitSet` is indeed allowed, then you've already got your solution. If not, then basically you'd reimplement it with your own code. Otavio's solution is on the right track. I think you are well-served for this question, closed or not. – Marko Topolnik Aug 15 '12 at 19:52
  • @MarkoTopolnik because this way the BitSet() won't be resizing itself: it's already the right size. Less operations behind the curtains. – Robottinosino Aug 15 '12 at 19:54
  • But then you'd need `new BitSet(Character.MAX_CODE_POINT)`, again without the hefty calculation. Each distinct character must have its own slot in the bitset and you are calculating the number of bits in the binary representation of `Character.MAX_CODE_POINT`. – Marko Topolnik Aug 15 '12 at 19:58
  • BTW since `Character.MAX_CODE_POINT` is an `int`, you shouldn't resort to floating-point calculations to find out your number: it's just a matter of some, no pun intended, *bit twiddling* :) – Marko Topolnik Aug 15 '12 at 20:02
  • @MarkoTopolnik are you sure? It seems to me like you would be wasting a lot of bits that way... I do the calculations to extract log2(n). – Robottinosino Aug 15 '12 at 20:02
  • Use a smaller example to reason about it: the string can only contain the 26 lowercase chars. Implement your logic using 5 bits of space. – Marko Topolnik Aug 15 '12 at 20:05

3 Answers3

4

According to Cracking the Coding Interview:

public static boolean isUniqueChars(String str) {
  if (str.length() > 256) { 
    return false;
  }
  int checker = 0;
  for (int i = 0; i < str.length(); ++i) {
    int val = str.charAt(i) - ‘a’;
    if ((checker & (1 << val)) > 0) return false;
    checker |= (1 << val);
  }
  return true;
}
Piyush Mattoo
  • 15,454
  • 6
  • 47
  • 56
Otavio Macedo
  • 1,542
  • 1
  • 13
  • 34
  • 1
    This makes a strong assumption about the range of valid characters... Unicode is much bigger than that. – Robottinosino Aug 15 '12 at 19:30
  • 4
    ...which does in no way mean that it wasn't the intention of the interviewers. Looks to me like a legit answer to yet another silly interview question. – Marko Topolnik Aug 15 '12 at 19:31
  • But anyway, this approach can fairly easily be extended into a wider range using a `long[]` and just a tad bit more logic. – Marko Topolnik Aug 15 '12 at 19:33
  • Actually, since the question says we can use an "auxiliary data structure using bit masks," I would interpret this to mean that a BitSet is permissible. One index per code point, and there you go. – yshavit Aug 15 '12 at 19:37
  • @yshavit I would assume that by "data structure" they don't allow a full-blown class. The wording seems to aim for an array. – Marko Topolnik Aug 15 '12 at 19:40
  • @MarkoTopolnik Probably true. Of course, BitSet is just a glorified `long[]`, so I one could write a poor man's BitSet using just the raw `long[]` (which is I think what you were getting at). – yshavit Aug 15 '12 at 19:51
  • @yshavit Yes, exactly. OP would have to either *use* `BitSet` or *implement* it from scratch, since bitset is the structure the question is aiming for. – Marko Topolnik Aug 15 '12 at 19:55
  • I think the question is not so trivial guys. At least for me. Lots of ways to get it wrong. – Robottinosino Aug 15 '12 at 20:03
  • @Robottinosino Didn't say it was trivial -- but that's one way of breaking the problem down. You write a pair of methods with a signature of something like: `boolean isSet(int bitIndex, long[] bits)` and another like: `long[] set(int bitIndex, long[] bits)` (the latter sets the bits, and returns either the same `long[]` it was passed in, or a new one if it needed to extend the array to get to that index). Each of those is going to do some arithmetic to get the `long` that holds the indexed bit (64 bits to a long) and then some masking to get at the bit within that long. – yshavit Aug 15 '12 at 22:54
0

Here is another solution using BitSet class in Java. This program will work for an ASCII Character Set String and have no limitation that the string has to be lower case or upper case. Also using BitSet class makes it easy to use and understand Bitwise operations.

public static boolean isUniqueStringUsingBitVectorClass(String s) {

        final int ASCII_CHARACTER_SET_SIZE = 256;

        final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE);

        // if more than  256 ASCII characters then there can't be unique characters
        if(s.length() > 256) {
            return false;
        }

        //this will be used to keep the location of each character in String
        final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE);

        for(int i = 0; i < s.length(); i++) {

            int charVal = s.charAt(i);
            charBitLocation.set(charVal);

            if(tracker.intersects(charBitLocation)) {
                return false;
            }

            tracker.or(charBitLocation);

            charBitLocation.clear(); //clear the individual character tracker for next iteration in the loop

        }

        return true;

    }
Prabhash Rathore
  • 709
  • 10
  • 18
0
public static boolean isUniqueCharsBitSet(String str) {

if (str.length() > 256) {
        return false;
    }

    BitSet set = new BitSet();
    for (int i = 0; i < str.length(); ++i) {
        int val = str.charAt(i) - 'a';
        if (!(set.get(val))) {
            set.set(val);
        } else {
            return false;
        }

    }
    return true;

}