4

I am having this method that takes a number of lists, which contain lines of books. I am combing them to a stream to then iterate over them to split on all non-letter's \\P{L}.

Is there a way to avoid the for-each loop and process this within a stream?

private List<String> getWordList(List<String>... lists) {
        List<String> wordList = new ArrayList<>();

        Stream<String> combinedStream = Stream.of(lists)
                .flatMap(Collection::stream);
        List<String> combinedLists = combinedStream.collect(Collectors.toList());

        for (String line: combinedLists) {
            wordList.addAll(Arrays.asList(line.split("\\P{L}")));
        }

        return wordList;
}
Andronicus
  • 25,419
  • 17
  • 47
  • 88
Hermann Stahl
  • 160
  • 2
  • 8

3 Answers3

11

Having stream, you can simply "flatMap" further and return the result:

return combinedStream
        .flatMap(str -> Arrays.stream(str.split("\\P{L}")))
        .collect(Collectors.toList());

To put it altogether:

private List<String> getWordList(List<String>... lists) {
    return Stream.of(lists)
        .flatMap(Collection::stream)
        .flatMap(str -> Arrays.stream(str.split("\\P{L}")))
        .collect(Collectors.toList());
}
Andronicus
  • 25,419
  • 17
  • 47
  • 88
4

You don't need to introduce so many variables :

private List<String> getWordList(List<String>... lists) {

    return Stream.of(lists) // Stream<Stream<String>>
                 .flatMap(Collection::stream) // Stream<String> 
                 .flatMap(Pattern.compile("\\P{L}")::splitAsStream) //Stream<String>     
                 .collect(toList()); // List<String>
}

As underlined by Holger, .flatMap(Pattern.compile("\\P{L}")::splitAsStream)
should be favored over .flatMap(s -> Arrays.stream(s.split("\\P{L}"))) to spare array allocation and pattern compilation performed for each element of the stream.

davidxxx
  • 125,838
  • 23
  • 214
  • 215
  • 3
    As explained [in this answer](https://stackoverflow.com/a/40933002/2711488), it’s recommended to use `.flatMap(Pattern.compile("\\P{L}") ::splitAsStream)`, this avoids recompiling the pattern for every stream element and does not populate a potentially large intermediate array. – Holger Mar 04 '19 at 11:13
  • @Holger Thank you very much for this reference. I didn't not still see this one from Java 8. But I don't like very much the method reference in this case `(Pattern.splitAsStream(String)` is not common to use for me at least) but according to your comment in the other post, it is required and I get it. We could also compile the pattern outside the stream but well not terrible either. – davidxxx Mar 04 '19 at 16:34
  • Well, you can also move the pattern to a constant like `static final Pattern NON_LETTER_CHARS = Pattern.compile("\\P{L}");` and then use either `.flatMap(NON_LETTER_CHARS::splitAsStream)` or `.flatMap(s -> NON_LETTER_CHARS.splitAsStream(s))`. In the end, you also have to know about the regex engine when using `s.split("\\P{L}")`. – Holger Mar 04 '19 at 16:39
  • Indeed That is that I referenced in my comment edit, sorry. It is a way but that it is not terrible either. – davidxxx Mar 04 '19 at 16:41
1

You can combine all the list and flatMap for result

private List<String> getWordList(List<String>... lists) {
    return Stream.of(lists)
    .flatMap(Collection::stream)
    .flatMap(str -> Arrays.stream(str.split("\\P{L}")))
    .collect(Collectors.toList());
}
Eklavya
  • 17,618
  • 4
  • 28
  • 57
  • 1
    `str` is not a String here but a `List`. So `List.split()` cannot compile. You miss an intermediary operation : https://stackoverflow.com/a/54969149/270371 – davidxxx Mar 03 '19 at 13:12