Java 8 Streams modify collection values

Question

Using the stream API; once the relevant data has been filtered I'd like to edit the data being collected. Here is the code so far:

  String wordUp = word.substring(0,1).toUpperCase() + word.substring(1);
  String wordDown = word.toLowerCase();

  ArrayList<String> text = Files.lines(path)
        .parallel() // Perform filtering in parallel
        .filter(s -> s.contains(wordUp) || s.contains(wordDown) &&  Arrays.asList(s.split(" ")).contains(word))
        .sequential()
        .collect(Collectors.toCollection(ArrayList::new));

Edit The code below is awful and I am trying to avoid it.(It also does not entirely work. It was done at 4am, please excuse it.)

    for (int i = 0; i < text.size(); i++) {
        String set = "";
        List temp = Arrays.asList(text.get(i).split(" "));
        int wordPos = temp.indexOf(word);

        List<String> com1 = (wordPos >= limit) ? temp.subList(wordPos - limit, wordPos) : new ArrayList<String>();
        List<String> com2 = (wordPos + limit < text.get(i).length() -1) ? temp.subList(wordPos + 1, wordPos + limit) : new ArrayList<String>();
        for (String s: com1)
            set += s + " ";
        for (String s: com2)
            set += s + " ";
        text.set(i, set);
    }

It's looking for a particular word in a text file, once the line has been filtered in I'd like to only collect a portion of the line every time. A number of words on either side of the keyword that is being searched for.

eg:

keyword = "the" limit = 1

It would find: "Early in the morning a cow jumped over a fence."

It should then return: "in the morning"

*P.S. Any suggested speed improvements will be up-voted.

To modify elements, use [map](http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) method of the stream. — fracz, Mar 09 '15 at 12:36
And also, what should happen if the `keyword` is the first in the sentence and `limit` is `1`? — Konstantin Yovkov, Mar 09 '15 at 12:36
What's the difference between `wordUp`, `wordDown` and `word`? — Eran, Mar 09 '15 at 12:37
Has the `limit` something to do with the fact that you want to return one word before and after the keyword? i can see no other potential use... — luk2302, Mar 09 '15 at 12:39
@kocko the code should do a boundary check then take as many on either side as possible. I will edit the code to show how the limit is used, however just bare-in-mind this was done at 4am and is terrible. I know this is not the best solution, as I am new to this. (That's why I'm here ^^,) — Warosaurus, Mar 09 '15 at 13:00
@WojciechFrącz - Thank you, I'll have a look at the map function. — Warosaurus, Mar 09 '15 at 13:02
There is no sense in calling `.parallel()` and `.sequential()` on the same stream. A stream is either parallel or sequential. Note that `collect` works flawlessly with parallel streams. Further, your condition `x || y && z` looks suspicious; mind the operator precedence. But it’s not clear what it is supposed to do anyway. — Holger, Mar 09 '15 at 13:38
Coding at 4am? Not too bad. At least you weren't drunk: http://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered/185181#185181 — Stuart Marks, Mar 09 '15 at 15:24

Holger · Accepted Answer · 2015-11-13T10:05:42.580

There are two different tasks you should think about. First, convert a file into a list of words:

List<String> words = Files.lines(path)
    .flatMap(Pattern.compile(" ")::splitAsStream)
    .collect(Collectors.toList());

This uses your initial idea of splitting at space characters. This might be sufficient for simple tasks, however, you should study the documentation of BreakIterator to understand the difference between this simple approach and a real, sophisticated word boundary splitting.

Second, if you have a list of words, your task is to find matches of your word and convert sequences of items around the match into a single match String by joining the words using a single space character as delimiter:

List<String> matches=IntStream.range(0, words.size())
    // find matches
    .filter(ix->words.get(ix).matches(word))
    // create subLists around the matches
    .mapToObj(ix->words.subList(Math.max(0, ix-1), Math.min(ix+2, words.size())))
    // reconvert lists into phrases (join with a single space
    .map(list->String.join(" ", list))
    // collect into a list of matches; here, you can use a different
    // terminal operation, like forEach(System.out::println), as well
    .collect(Collectors.toList());

this answer is fantastic, it is elegant and exactly the sort of answer I am looking for thank you so much. I like that it avoids any issues that might arise when selecting words by line. I will have a look at the link you suggested. One more thing, do you maybe have a link or advice as to how I could find the time complexity of something like this? — Warosaurus, Mar 09 '15 at 14:26
I could be wrong but I assume the time complexity for both is `O(n)` since they go though all `n` elements in the stream. However, together would they be `O(n) + O(n) = O(2n)`? — Warosaurus, Mar 09 '15 at 14:39

Java 8 Streams modify collection values

1 Answers1