0

I was trying to write some functional programming code (using lambdas and streams from Java 8) to test if a string has unique characters in it (if it does, return true, if it does not, return false). A common way to do this using vanilla Java is with a data structure like a set, i.e.:

public static boolean oldSchoolMethod(String str) {
    Set<String> set = new HashSet<>();

    for(int i=0; i<str.length(); i++) {
        if(!set.add(str.charAt(i) + "")) return false;
    }
    return true;
}

The set returns true if the character/object can be added to the set (because it did not exist there previously). It returns false if it cannot (it exists in the set already, duplicated value, and cannot be added). This makes it easy to break out the loop and detect if you have a duplicate, without needing to iterate through all length N characters of the string.

I know in Java 8 streams you cannot break out a stream. Is there anyway way to capture the return value of an intermediate stream operation, like adding to the set's return value (true or false) and send that value to the next stage of the pipeline (another intermediate operation or terminal stream operation)? i.e.

Arrays.stream(myInputString.split(""))
    .forEach( i -> {
       set.add(i) // need to capture whether this returns "true" or "false" and use that value later in 
                  // the pipeline or is this bad/not possible?
    });

One of the other ways I thought of solving this problem, is to just use distinct() and collect the results into a new string and if it is the same length as the original string, than you know it is unique, else if there are different lengths, some characters got filtered out for not being distinct, thus you know it is not unique when comparing lengths. The only issue I see here is that you have to iterate through all length N chars of the string, where the "old school" method best-case scenario could be done in almost constant time O(1), since it is breaking out the loop and returning as soon as it finds 1 duplicated character:

public static boolean java8StreamMethod(String str) {
    String result = Arrays.stream(str.split(""))
            .distinct()
            .collect(Collectors.joining());

    return result.length() == str.length();
}
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
ennth
  • 1,698
  • 5
  • 31
  • 63
  • 1
    What is your question? – Eritrean Jan 13 '21 at 09:26
  • 1
    Does this answer your question? [How to check if exists any duplicate in Java 8 Streams?](https://stackoverflow.com/questions/30053487/how-to-check-if-exists-any-duplicate-in-java-8-streams) – Hulk Jan 13 '21 at 09:39
  • Your "other way" is no different from the first attempt (except that it works), it too will loop through all characters before stream processing will continue and you would have the opportunity to investigate a set of booleans to see if one or more are false. Unless you are dealing with hundreds of thousands of characters here, you are micro-optimising. Write code that works, is readable and can be maintained easily. – Gimby Jan 13 '21 at 09:45
  • This is a classic example of an X/Y problem by the way. You want to solve a problem and have chosen a solution yourself and ask a question about the solution (X). It's better to ask a question about the problem instead (Y). ChiefGokhlayeh's seemingly fine answer deals with the problem, not your solution.That makes the answer kind of wrong because it does not answer your question... but probably it actually does, you just didn't ask the right question. – Gimby Jan 13 '21 at 09:47
  • @Gimby as noted in the proposed duplicate I linked, the Set-version ("`oldSchoolMethod`") is actually the best solution. This problem just doesn't translate well to streams (this is true for many algorithms that cannot treat elements independently). All stream versions require stateful predicates to be short-circuiting, defeating much of the benefits of streams. – Hulk Jan 13 '21 at 09:50

3 Answers3

4

Your solutions are all performing unnecessary string operations.

E.g. instead of using a Set<String>, you can use a Set<Character>:

public static boolean betterOldSchoolMethod(String str) {
    Set<Character> set = new HashSet<>();
    for(int i=0; i<str.length(); i++) {
        if(!set.add(str.charAt(i))) return false;
    }
    return true;
}

But even the boxing from char to Character is avoidable.

public static boolean evenBetterOldSchoolMethod(String str) {
    BitSet set = new BitSet();
    for(int i=0; i<str.length(); i++) {
        if(set.get(str.charAt(i))) return false;
        set.set(str.charAt(i));
    }
    return true;
}

Likewise, for the Stream variant, you can use str.chars() instead of Arrays.stream(str.split("")). Further, you can use count() instead of collecting all elements to a string via collect(Collectors.joining()), just to call length() on it.

Fixing both issues yields the solution:

public static boolean newMethod(String str) {
    return str.chars().distinct().count() == str.length();
}

This is simple, but lacks short-circuiting. Further, the performance characteristics of distinct() are implementation-dependent. In OpenJDK, it uses an ordinary HashSet under the hood, rather than BitSet or such alike.

Holger
  • 285,553
  • 42
  • 434
  • 765
3

This code might work for you:

public class Test {

    public static void main(String[] args) {

        String myInputString = "hellowrd";

        HashSet<String> set = new HashSet<>();
        Optional<String> duplicateChar =Arrays.stream(myInputString.split("")).
                filter(num-> !set.add(num)).findFirst();
        if(duplicateChar.isPresent()){
            System.out.println("Not unique");
        }else{
            System.out.println("Unique");
        }

    }
}

Here using findFirst() I am able to find the first duplicate element. So that we don't need to continue on iterating rest of the characters.

Joby Wilson Mathews
  • 10,528
  • 6
  • 54
  • 53
  • 1
    Also, using `anyMatch` would eliminate the need for the optional and `isPresent()` check. But still, this won't beat the loop - at best it can be equivalent. – Hulk Jan 13 '21 at 10:03
  • the shortest stream version I know is Holgers `return list.stream().allMatch(new HashSet<>()::add)` https://stackoverflow.com/a/30053822/2513200, but it still has the same problems (stateful predicate) - see the cautionary comments by Stuart Marks. – Hulk Jan 13 '21 at 10:07
  • ah! I think findFirst() implementated in this way is what I was looking for – ennth Jan 14 '21 at 16:06
1

What about just mapping to a boolean?

Arrays.stream(myInputString.split(""))
   .map(set::add)
   .<...>

That would solve your concrete issue, I guess, but it's not a very nice solution because the closures in stream chains should not have side-effects (that is exactly the point of functional programming...).

Sometimes the classic for-loop is still the better choice for certain problems ;-)

Stefan Winkler
  • 3,871
  • 1
  • 18
  • 35
  • 1
    Yep - no way to solve this with streams without resorting to stateful predicates - better to stick with the loop. However, mapping to a `Boolean` is rarely useful. `anyMatch` or `allMatch` are usually the way to go. – Hulk Jan 13 '21 at 10:16
  • is there any way to break out the stream when you add a duplicate to the set with this method? – ennth Jan 14 '21 at 16:06
  • Well If you add a filter.findAny or anyMatch after the map, you can break out. – Stefan Winkler Jan 15 '21 at 18:41