0

I'm trying to use Java 8 Lambda expressions and streams to parse some logs. I have one giant log file that has run after run. I want to split it into separate collections, one for each run. I do not know how many runs the log has in advanced. And to exercise my very weak lambda expressions muscles I'd like to do it in one pass through the list.

Here is my current implementation:

    List<String> lines = readLines(fileDirectory);

    Pattern runStartPattern = Pattern.compile("INFO: \\d\\d:\\d\\d:\\d\\d: Starting");

    LinkedList<List<String>> testRuns = new LinkedList<>();

    List<String> currentTestRun = new LinkedList<>(); // In case log starts in middle of run
    testRuns.add(currentTestRun);

    for(String line:lines){
        if(runStartPattern.matcher(line).find()){
            currentTestRun = new ArrayList<>();
            testRuns.add(currentTestRun);
        }
        currentTestRun.add(line);
    }
    if(testRuns.getFirst().size()==0){ // In case log starts at a run
        testRuns.removeFirst();
    }

Basically something like TomekRekawek's solution here but with an unknown partition size to begin with.

Community
  • 1
  • 1
Carlos Bribiescas
  • 4,197
  • 9
  • 35
  • 66
  • What is the problem you are having? You have described what you want to do, not the difficulty you are having trying to do it. – Zéychin Nov 13 '14 at 16:58
  • I don't see a way to do this same thing with a stream. I don't know how in a stream to mark the next element as needing to create a new `List` – Carlos Bribiescas Nov 13 '14 at 16:59
  • Maybe this will help: http://stackoverflow.com/questions/29095967/splitting-list-into-sublists-along-elements/29096777#29096777 – Alexis C. Mar 28 '15 at 21:21

1 Answers1

1

There's no standard way to easily achieve this in Stream API, but my StreamEx library has a groupRuns method which can solve this pretty easily:

List<List<String>> testLines = StreamEx.of(lines)
        .groupRuns((a, b) -> !runStartPattern.matcher(b).find())
        .toList();

It groups the input elements based on some predicate which is applied to the pair of adjacent elements. Here we don't want to group the lines if the second line matches the runStartPattern. This works correctly regardless of whether the log starts in the middle of run or not. Also this feature works nice with parallel streams as well.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • Btw, did you add it to your API in response to my question? – Carlos Bribiescas Jul 17 '15 at 13:25
  • @CarlosBribiescas, nope, I've just found your question. The groupRuns feature is useful in [other](http://stackoverflow.com/a/31026610/4856258) places [as well](http://stackoverflow.com/a/31406013/4856258). It's part of the partial reduction family along with `collapse`, `runLengths` and `intervalMap` methods which combine several adjacent elements into one and internally use the same mechanism. – Tagir Valeev Jul 17 '15 at 15:37