2

I've a piece of code that "split" a file in some chunks when find a start record.

List<StringBuilder> list = new ArrayList<>();
StringBuilder jc = null;
try (BufferedReader br = Files.newBufferedReader(Paths.get("")) {
    for (String line = br.readLine(); line != null; line = br.readLine()) {
        if (line.startsWith("REQ00")) {
            jc = new StringBuilder();
            list.add(jc);
        }
        jc.append(line);
    }
} catch (IOException e) {
    e.printStackTrace();
}

Is there any way to "convert" this code into Java 8 Stream way ?

Stefano R.
  • 321
  • 1
  • 5
  • 18

2 Answers2

3

Use the right tool for the job. With Scanner, it’s as simple as

List<String> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
    s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
    while(s.hasNext()) list.add(s.next());
} catch (IOException e) {
    e.printStackTrace();
}

Now your code has the special requirements of creating StringBuilders and not retaining the line breaks. So the extended version is:

List<StringBuilder> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
    s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
    while(s.hasNext()) list.add(new StringBuilder(s.next().replaceAll("\\R", "")));
} catch (IOException e) {
    e.printStackTrace();
}

A more efficient variant is

List<StringBuilder> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
    s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
    while(s.hasNext()) list.add(toStringBuilderWithoutLinebreaks(s.next()));
} catch (IOException e) {
    e.printStackTrace();
}

…

static final Pattern LINE_BREAK = Pattern.compile("\\R");
static StringBuilder toStringBuilderWithoutLinebreaks(String s) {
    Matcher m = LINE_BREAK.matcher(s);
    if(!m.find()) return new StringBuilder(s);
    StringBuilder sb = new StringBuilder(s.length());
    int last = 0;
    do { sb.append(s, last, m.start()); last = m.end(); } while(m.find());
    return sb.append(s, last, s.length());
}

Starting with Java 9, you can also use a Stream operation for it:

List<StringBuilder> list;
try(Scanner s = new Scanner(Paths.get(path))) {
    list = s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE))
            .tokens()
            .map(string -> toStringBuilderWithoutLinebreaks(string))
            .collect(Collectors.toList());
} catch (IOException e) {
    e.printStackTrace();
    list = List.of();
}
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 2
    Because it’s capable of processing text across line boundaries. When processing a stream of lines for your task, you need to work across multiple stream elements. In contrast, the `Scanner` produces multi-line elements ranging from an occurrence of your delimiter to the next one in the first place. If you hadn’t the requirement of eliminating the line breaks (at least, your original code eliminates them), the strings produced by the scanner were already the final result, much more efficient than splitting the text into lines followed by joining them afterwards. – Holger Mar 23 '18 at 09:42
1
Map<Integer, String> chunks = Files.lines(Paths.get("")).collect(
    Collectors.groupingBy(
        new Function<String, Integer>(){
            Integer lastKey = 0;
            public Integer apply(String s){
                if(s.startsWith("REQ00")){
                    lastKey = lastKey+1;
                }
                return lastKey;
            }
        }, Collectors.joining()));

I just used joining, which creates a string instead of a string builder. It could be replaced with a collector that uses string builder, or the strings could be changed to string builders afterwards.

matt
  • 10,892
  • 3
  • 22
  • 34
  • 1
    This assumes that the function is evaluated in the right order which is not guaranteed… – Holger Mar 23 '18 at 09:20
  • Would the evaluation of the function ever be ordered? It the stream is not "unordered", or concurrent. And the collector is not labelled as concurrent, it seems like the function will be called in order. I am trying to follow your answer from [here](https://stackoverflow.com/questions/29216588/how-to-ensure-order-of-processing-in-java8-streams) and the subsequent docs you linked. But it is not clear to me. – matt Mar 23 '18 at 09:37
  • 1
    The order of the function evaluation is the *processing order*. It is never guaranteed. For an ordered stream, the *encounter order* is maintained, which means that the final result will reflect it. This only works, if the functions produce the right result, regardless of the order they are evaluated. Your code likely produces the intended result in a sequential evaluation (though it’s not guaranteed), but will break for sure in a parallel evaluation (well, almost for sure, as even that is not guaranteed). – Holger Mar 23 '18 at 09:45