Determine if a list composed of anagram elements in Java 8

Question

I want to determine if a list is anagram or not using Java 8.

Example input:

"cat", "cta", "act", "atc", "tac", "tca"

I have written the following function that does the job but I am wondering if there is a better and elegant way to do this.

boolean isAnagram(String[] list) {
    long count = Stream.of(list)
            .map(String::toCharArray)
            .map(arr -> {
                Arrays.sort(arr);
                return arr;
            })
            .map(String::valueOf)
            .distinct()
            .count();
    return count == 1;

}

It seems I can't sort char array with Stream.sorted() method so that's why I used a second map operator. If there is some way that I can operate directly on char stream instead of Stream of char array, that would also help.

tobias_k · Accepted Answer · 2018-10-30T12:02:25.317

10

Instead of creating and sorting a char[] or int[], which can not be done inline and thus "breaks" the stream, you could get a Stream of the chars in the Strings and sort those before converting them to arrays. Note that this is an IntSteam, though, and String.valueOf(int[]) will include the array's memory address, which is not very useful here, so better use Arrays.toString in this case.

boolean anagrams = Stream.of(words)
        .map(String::chars).map(IntStream::sorted)
        .map(IntStream::toArray).map(Arrays::toString)
        .distinct().count() == 1;

Of course, you can also use map(s -> Arrays.toString(s.chars().sorted().toArray())) instead of the series of four maps. Not sure if there's a (significant) difference in speed, it's probably mainly a matter of taste.

Also, you could use IntBuffer.wrap to make the arrays comparable, which should be considerably faster than Arrays.toString (thanks to Holger in comments).

boolean anagrams = Stream.of(words)
        .map(s -> IntBuffer.wrap(s.chars().sorted().toArray()))
        .distinct().count() == 1;

edited Oct 30 '18 at 12:02

answered Oct 29 '18 at 15:10

tobias_k

81,265
12
120
179

1

Nice answer, +1 from me. Although I would personally split the mapping of the words and list itself, instead of using four maps. [It still does exactly the same and outputs the same results, though.](https://tio.run/##lVDLTsMwELznK1Y52Yha4lwFiQ/glGPFYWuH4uBH5N0UVVW@PdiOgEocEHsZeXZ3dsYjnnE3mvd1tX6KiWHMhJrZOnW3b35xxGlAX1raIRE8ow3XBmCaj85qIEbOcI7WgM8t0XOy4XR4AZRlDOCb@IjJEHRwbTVyew@tZiyAur6QdQHGDTS2y74KHGN0AwbAgKeEvkj0m6n4KqqorHOllMepcrB7hKeU8EKK42ah8kq/YSLxs/FnKcofMph/rXCst4WUN9aMJbZB842S0nEOmYCug4ctbX8hHnK0mdWUXbML4iu4LBNLs6zrJw) – Kevin Cruijssen Oct 29 '18 at 15:30
2

@Michael you get it for free when you wrap the array, `CharBuffer.wrap(array)` in case of `char[]`, `IntBuffer.wrap(array)` in case of `int[]`. No need for an expensive conversion to a `String`… – Holger Oct 30 '18 at 10:29
Generally, this is a typical example of method reference overuse. A single `.map(s -> Arrays.toString(s.chars().sorted().toArray())` or `.map(s -> IntBuffer.wrap(s.chars().sorted().toArray())` (see my previous comment), is much better to read… – Holger Oct 30 '18 at 10:32
@Holger Yes, you mentioned that. I added that in the last paragraph, but I guess it was a bis ambiguous. The "not sure whether faster" part was just about the map vs. lambda bit. Reworded that last part. – tobias_k Oct 30 '18 at 12:03
Yes, that’s a much better wording. I’d also assume that there is no significant speed difference between multiple `map` steps with method references and a single `map` step (compare with [this Q&A](https://stackoverflow.com/q/24054773/2711488)). I just think that the single lambda expression is more readable here. – Holger Oct 30 '18 at 12:46

score 8 · Answer 2 · answered Oct 30 '18 at 10:48

I would not deal with counting distinct values, as that’s not what you are interested in. What you want to know, is whether all elements are equal according to a special equality rule.

So when we create a method to convert a String to a canonical key (i.e. all characters sorted)

private CharBuffer canonical(String s) {
    char[] array = s.toCharArray();
    Arrays.sort(array);
    return CharBuffer.wrap(array);
}

we can simply check whether all subsequent elements are equal to the first one:

boolean isAnagram(String[] list) {
    if(list.length == 0) return false;
    return Arrays.stream(list, 1, list.length)
        .map(this::canonical)
        .allMatch(canonical(list[0])::equals);
}

Note that for method references of the form expression::name, the expression is evaluate once and the result captured, so canonical(list[0]) is evaluated only once for the entire stream operation and only equals invoked for every element.

Of course, you can also use the Stream API to create the canonical keys:

private IntBuffer canonical(String s) {
    return IntBuffer.wrap(s.chars().sorted().toArray());
}

^{(the isAnagram method does not need any change)}

Note that CharBuffer and IntBuffer can be used as lightweight wrappers around arrays, like in this answer, and implement equals and hashCode appropriately, based on the actual array contents.

Looks good. However, I do think that it's strange to return false if the list is empty, but return true if the list has one element. — Michael, Oct 30 '18 at 11:11
@Michael that’s a choice, the author has to make. You may consider returning `true` instead or deliberately throw an exception, as the question whether an empty collection consists of anagrams is pointless, however, I decided to mimic the behavior of the OP’s original code, as checking whether the distinct count is one will lead to `false` for zero elements. — Holger, Oct 30 '18 at 11:16

score 4 · Answer 3 · answered Oct 29 '18 at 15:15

4

I wouldn't sort the char array, as sorting is O(NlogN), which is not necessary here.

All we need is, for each word of the list, to count the occurrences of each character. For this, we're collecting each word's characters to a Map<Integer, Long>, with the keys being each character and the value being its count.

Then, we check that, for all the words in the array argument, we have the same count of characters, i.e. the same map:

return Arrays.stream(list)
    .map(word -> word.chars()
            .boxed().collect(Collectors.grouping(c -> c, Collectors.counting()))
    .distinct()
    .count() == 1;

answered Oct 29 '18 at 15:15

fps

33,623
8
55
110

1

While this is true, I wonder whether this is any faster, or even slower, provided that this method presumably presents much greater overhead (?) and I'd think that all the strings are relatively small (say, not exceeding a few dozens of characters at most). – tobias_k Oct 29 '18 at 15:24
If I were not to use Streams I would implement something like this using HashMaps. @tobias_k in many languages most words consists of small number of letters which reduces `O(NlogN)` to be nearly negligible as you mentioned. Now I wonder what is the memory imprint of both methods and whether if one is advantageous to other in terms of speed. – Limonkufu Oct 29 '18 at 15:35
@tobias_k for short strings you are right, but for these, it probably doesn't matter anyway. – Peter Lawrey Oct 29 '18 at 15:35
@Limonkufu I think Tobias and Peter's comments are valid. Maybe a `Map` has more overhead than sorting small arrays in place. Anyways, if you are after performance, you shouldn't be using streams at all, i.e. a traditional iterative approach would be faster, I believe... – fps Oct 29 '18 at 16:19
@tobias_k Agreed, you have a valid point. I think that ultimately, it will all depend on the input. Maybe my solution is better for longer strings, while yours works better for the general case... – fps Oct 29 '18 at 16:22
@FedericoPeraltaSchaffner ow about [this](https://stackoverflow.com/a/53056255/1059372)? – Eugene Oct 30 '18 at 01:56
@Eugene your `BitSet` approach can not handle duplicate characters. But anyway, what’s often overlooked, is that the actual goal is not counting the number of distinct values, so with [a simpler approach](https://stackoverflow.com/a/53062631/2711488), it can be done even short-circuiting. – Holger Oct 30 '18 at 10:53

score 2 · Answer 4 · answered Oct 30 '18 at 02:18

Alternatively an updated version of your implementation that could work would be:

boolean isAnagram(String[] list) {
    return Stream.of(list) // Stream<String>
            .map(String::toCharArray) // Stream<char[]>
            .peek(Arrays::sort) // sort 
            .map(String::valueOf) // Stream<String>
            .distinct() //distinct
            .count() == 1;
}

Eugene · Answer 5 · 2018-10-30T01:55:42.587

Or may be with a BitSet:

  System.out.println(stream.map(String::chars)
        .map(x -> {
            BitSet bitSet = new BitSet();
            x.forEach(bitSet::set);
            return bitSet;
        })
        .collect(Collector.of(
            BitSet::new,
            BitSet::xor,
            (left, right) -> {
                left.xor(right);
                return left;
            }
        ))
        .cardinality() == 0);

Determine if a list composed of anagram elements in Java 8

5 Answers5

Linked