0

Using the stream API; once the relevant data has been filtered I'd like to edit the data being collected. Here is the code so far:

  String wordUp = word.substring(0,1).toUpperCase() + word.substring(1);
  String wordDown = word.toLowerCase();

  ArrayList<String> text = Files.lines(path)
        .parallel() // Perform filtering in parallel
        .filter(s -> s.contains(wordUp) || s.contains(wordDown) &&  Arrays.asList(s.split(" ")).contains(word))
        .sequential()
        .collect(Collectors.toCollection(ArrayList::new));

Edit The code below is awful and I am trying to avoid it.(It also does not entirely work. It was done at 4am, please excuse it.)

    for (int i = 0; i < text.size(); i++) {
        String set = "";
        List temp = Arrays.asList(text.get(i).split(" "));
        int wordPos = temp.indexOf(word);

        List<String> com1 = (wordPos >= limit) ? temp.subList(wordPos - limit, wordPos) : new ArrayList<String>();
        List<String> com2 = (wordPos + limit < text.get(i).length() -1) ? temp.subList(wordPos + 1, wordPos + limit) : new ArrayList<String>();
        for (String s: com1)
            set += s + " ";
        for (String s: com2)
            set += s + " ";
        text.set(i, set);
    }

It's looking for a particular word in a text file, once the line has been filtered in I'd like to only collect a portion of the line every time. A number of words on either side of the keyword that is being searched for.

eg:

keyword = "the" limit = 1

It would find: "Early in the morning a cow jumped over a fence."

It should then return: "in the morning"

*P.S. Any suggested speed improvements will be up-voted.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
Warosaurus
  • 520
  • 1
  • 5
  • 14
  • 1
    I don't see how you use this `limit` in your code... – Konstantin Yovkov Mar 09 '15 at 12:35
  • 2
    To modify elements, use [map](http://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#map-java.util.function.Function-) method of the stream. – fracz Mar 09 '15 at 12:36
  • And also, what should happen if the `keyword` is the first in the sentence and `limit` is `1`? – Konstantin Yovkov Mar 09 '15 at 12:36
  • 1
    What's the difference between `wordUp`, `wordDown` and `word`? – Eran Mar 09 '15 at 12:37
  • Has the `limit` something to do with the fact that you want to return one word before and after the keyword? i can see no other potential use... – luk2302 Mar 09 '15 at 12:39
  • @kocko the code should do a boundary check then take as many on either side as possible. I will edit the code to show how the limit is used, however just bare-in-mind this was done at 4am and is terrible. I know this is not the best solution, as I am new to this. (That's why I'm here ^^,) – Warosaurus Mar 09 '15 at 13:00
  • @WojciechFrącz - Thank you, I'll have a look at the map function. – Warosaurus Mar 09 '15 at 13:02
  • 1
    There is no sense in calling `.parallel()` and `.sequential()` on the same stream. A stream is either parallel or sequential. Note that `collect` works flawlessly with parallel streams. Further, your condition `x || y && z` looks suspicious; mind the operator precedence. But it’s not clear what it is supposed to do anyway. – Holger Mar 09 '15 at 13:38
  • Coding at 4am? Not too bad. At least you weren't drunk: http://stackoverflow.com/questions/184618/what-is-the-best-comment-in-source-code-you-have-ever-encountered/185181#185181 – Stuart Marks Mar 09 '15 at 15:24

1 Answers1

7

There are two different tasks you should think about. First, convert a file into a list of words:

List<String> words = Files.lines(path)
    .flatMap(Pattern.compile(" ")::splitAsStream)
    .collect(Collectors.toList());

This uses your initial idea of splitting at space characters. This might be sufficient for simple tasks, however, you should study the documentation of BreakIterator to understand the difference between this simple approach and a real, sophisticated word boundary splitting.

Second, if you have a list of words, your task is to find matches of your word and convert sequences of items around the match into a single match String by joining the words using a single space character as delimiter:

List<String> matches=IntStream.range(0, words.size())
    // find matches
    .filter(ix->words.get(ix).matches(word))
    // create subLists around the matches
    .mapToObj(ix->words.subList(Math.max(0, ix-1), Math.min(ix+2, words.size())))
    // reconvert lists into phrases (join with a single space
    .map(list->String.join(" ", list))
    // collect into a list of matches; here, you can use a different
    // terminal operation, like forEach(System.out::println), as well
    .collect(Collectors.toList());
Holger
  • 285,553
  • 42
  • 434
  • 765
  • this answer is fantastic, it is elegant and exactly the sort of answer I am looking for thank you so much. I like that it avoids any issues that might arise when selecting words by line. I will have a look at the link you suggested. One more thing, do you maybe have a link or advice as to how I could find the time complexity of something like this? – Warosaurus Mar 09 '15 at 14:26
  • I could be wrong but I assume the time complexity for both is `O(n)` since they go though all `n` elements in the stream. However, together would they be `O(n) + O(n) = O(2n)`? – Warosaurus Mar 09 '15 at 14:39
  • 5
    @Warosaurus Technically, `O(2n) = O(n)` ;-) – Alexis C. Mar 09 '15 at 15:03