Java 8 Streams: How to read lines between two lines specified by line content

Question

Input to the current problem statement is -

Input.txt

#START_OF_TEST_CASES

#DATA
key1:VA1
key2:VA2
key3:VA3
key4:VA4
key5:VA5
#DEND

#ENTRIES:
1{key1}{key1}{key3}
2{key2}{key2}{key1}
3{key3}{key1}{key2}
#EEND

Now I want to read this file and create a HashMap of the lines in between #DATA and #DEND. Key for the HashMap is the left part of ':' and value is the right part. Although we can achieve this iteratively I want to do this using Java 8's Stream APIs.

Streams is not the answer to everything. This is a great example where Streams is not the right tool for the job. You can do it by writing your own collector or your own stateful filter, but you'd end up with more (and more complex) code than a normal loop, so why do it? It would also fail badly if the stream is parallel, *yikes*! — Andreas, Dec 20 '17 at 18:37
Java 9: `Files.lines(path) .dropWhile(s -> !s.equals("#DATA")) .skip(1) .takeWhile(s -> !s.equals("#DEND")) …`, Java 8: wait for Java 9. Regarding how to split and store into a `Map`, there are already several questions with answers. — Holger, Dec 20 '17 at 18:49
Ok, here Java 8 workaround to get the stream: `Arrays.stream(new Scanner(path) .findWithinHorizon("(?<=\\R#DATA\\R)(.|\\R)*(?=\\R#DEND\\R)", 0).split("\\R")) …` — Holger, Dec 20 '17 at 18:57

score 3 · Answer 1 · answered Dec 21 '17 at 10:26

Unfortunately, Java 8 streams do not support such extraction of elements in-between two matches. In Java 9, you could use

Map<String,String> map;
try(Stream<String> stream = Files.lines(path)) {
    map = stream
        .dropWhile(s -> !s.equals("#DATA")).skip(1)
        .takeWhile(s -> !s.equals("#DEND"))
        .filter(Pattern.compile("^[^#].*:").asPredicate())
        .map(item -> item.split(":", 2))
        .collect(Collectors.toMap(parts->parts[0], parts->parts[1]));
}
// use the map
map.forEach((k,v)->System.out.println(k+" -> "+v));

dropWhile will drop all elements before the first matching element, skip(1) will skip the matching element, takeWhile effectively removes all elements after the first element matching the end criteria.

The next filter step using the pattern ^[^#].*: will skip all lines starting with # or not containing a :. The remaining steps are straight-forward. When specifying a limit of 2 to split, it will not search for subsequent :s after encountering the first :.

Under Java 8, extracting the part between the two matches can be implemented with a Scanner before the stream operation:

String part;
try(Scanner s = new Scanner(path)) {
    part = s.findWithinHorizon("(?<=\\R#DATA\\R)(.|\\R)*(?=\\R#DEND\\R)", 0);
}
Map<String,String> map = Pattern.compile("\\R").splitAsStream(part)
    .filter(Pattern.compile("^[^#].*:").asPredicate())
    .map(item -> item.split(":", 2))
    .collect(Collectors.toMap(parts->parts[0], parts->parts[1]));
// use the map
map.forEach((k,v)->System.out.println(k+" -> "+v));

I would go with a `findAll` addition in java-9 that could be applicable here I think; and that regex... a couple of lookarounds, a `\\R` introduced in java-8... beautiful overall! — Eugene, Dec 21 '17 at 10:51
@Eugene: when there can be multiple occurrences of this section, `findAll()` is the way to go. And for Java 8, [this answer](https://stackoverflow.com/a/42978216/2711488) provides the `findAll` operation. — Holger, Dec 21 '17 at 11:56

score 0 · Answer 2 · answered Dec 21 '17 at 03:06

0

If you see the lies between #DATA and #DEND contain ':' therefore I came up with the following solution -

    File file = new File("Input.txt");
    try {
        Map<String,String> map = Files.lines(file.toPath())
                                     .filter(list -> list.contains(":"))
                                     .map(item -> item.split(":"))
                                     .filter(arr -> arr.length > 1)
                                     .collect(Collectors.toMap(parts->parts[0], parts->parts[1]));
        System.out.println(map.values());
    } catch (IOException e) {
        e.printStackTrace();
    }

The above code first filters only the lines containing colon ':', then split these lines based on the colon, after that, we filter only the list having length greater than 1 because if you see the input.txt file carefully, you can find that "#ENTRIES:" contains colon but do not contain any character after that as others do. Once we get the required data we create the HashMap.

answered Dec 21 '17 at 03:06

Sunil

429
1
9
25

1

If you know the answer why you asked the question without your code – janith1024 Dec 21 '17 at 04:01
Looks more like a stunt to answer ones own question. However, adding additional condition `.filter(ele -> ele.contains(":") && !ele.contains("#"))` will get rid of the second filter in your answer – Jeevan Varughese Dec 21 '17 at 04:30
I did not know the answer but I came up with this answer based on the answer of [@fbokovikov](https://stackoverflow.com/users/4545552/fbokovikov). Thanks to his [answer](https://stackoverflow.com/a/47912850/2704051) – Sunil Dec 21 '17 at 05:39
@Sunil If you chose my answer as the correct one, mark it as correct and vote for him please – yanefedor Dec 21 '17 at 07:43
[@fbokovikov](https://stackoverflow.com/users/4545552/fbokovikov) I did not find it right but it helped me to get my solution. – Sunil Dec 22 '17 at 07:31

score -1 · Answer 3 · answered Dec 20 '17 at 19:08

-1

Following code pattern resolves your problem:

    List<String> lines = Files.readAllLines(Paths.get("Input.txt"));
    int from = lines.indexOf("#DATA") + 1;
    int to  = lines.indexOf("#DEND");
    Map<String, String> map = lines.stream()
            .skip(from)
            .limit(to - from)
            .map(s -> s.split(":"))
            .collect(Collectors.toMap(pair -> pair[0], pair -> pair[1]));

answered Dec 20 '17 at 19:08

yanefedor

2,132
1
21
37

1

If you have a `List` and indices, the straight-forward approach is `lines.subList(from, to).stream()` – Holger Dec 21 '17 at 08:34

Java 8 Streams: How to read lines between two lines specified by line content

3 Answers3