I had a document that looked like the following:
data.txt
100, "some text"
101, "more text"
102, "even more text"
I processed it using regex and returned a new processed documents as the follow:
Stream<String> lines = Files.lines(Paths.get(data.txt);
Pattern regex = Pattern.compile("([\\d{1,3}]),(.*)");
List<MyClass> result =
lines.map(regex::matcher)
.filter(Matcher::find)
.map(m -> new MyClass(m.group(1), m.group(2)) //MyClass(int id, String text)
.collect(Collectors.toList());
This returns a list of MyClass processed. Can run in parallel and everything is ok.
The problem is that I now have this:
data2.txt
101, "some text
the text continues in the next line
and maybe in the next"
102, "for a random
number
of lines"
103, "until the new pattern of new id comma appears"
So, I somehow need to join lines that are being read from the stream until a new match appear. (Something like an buffer?)
I tried to Collect strings and then collect MyClass(), but with no success, because I cannot actually split streams.
Reduce comes to mind to concatenate lines, but I'll concatenate just lines and I cannot reduce and generate a new stream of lines.
Any ideas how to solve this with java 8 streams?