5

I have the following file format:

Text1
+ continuation of Text1
+ more continuation of Text1 
Text2
+ continuation of Text2
+ more continuation of Text2
+ even more continuation of Text2

Continuations are marked by \n+. (Newline, plus character, space as a three character string.) Continuations can be any number of lines, including 0.

I want the following output (each is a line printed with .forEach):

Text1 continuation of Text1 more continuation of Text1 
Text2 continuation of Text2 more continuation of Text2 even more continuation of Text2

I would like to use only Java streams to do the conversion, preferably with Collect. Is there a way to do this elegantly?

EDIT:

Another, more realistic example:

Lorem ipsum dolor sit amet, consectetur 
+ adipiscing elit, sed do eiusmod tempor incididunt 
+ ut labore et dolore magna aliqua. Ut enim ad minim veniam, 
+ quis nostrud exercitation ullamco laboris nisi ut aliquip ex 
+ ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit 
+ esse cillum dolore eu fugiat nulla pariatur. Excepteur sint 
+ occaecat cupidatat non proident, sunt in culpa qui officia 
+ deserunt mollit anim id est laborum.

Expected result is two lines:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
TFuto
  • 1,361
  • 15
  • 33
  • 1
    Is it correct to assume that first line is always the first separator and next lines do not *contain* this separator? I mean does `continuation of Text1` really contains `Text1` or that was just to make the example clear? – Eugene Mar 22 '17 at 20:01
  • No, that was just for clarification. – TFuto Mar 22 '17 at 20:05
  • 1
    1) so, the first line is always the separator? y/n 2) The separator is not contained in the next lines? y/n :) – Eugene Mar 22 '17 at 20:07
  • 1. No, the first line is not a separator. It is an arbitrary text. 2. The concatenation is only shown by "\n+ ", so newline and plus char and space. I have added another example. – TFuto Mar 22 '17 at 20:09
  • 1
    It might be possible, but not well suited to streams, which can't reference earlier elements and preferably have stateless mappings. An old-school loop is the better choice here. – Bohemian Mar 22 '17 at 20:14

2 Answers2

10

In Java 9, you could use

static final Pattern LINE_WITH_CONTINUATION = Pattern.compile("(\\V|\\R\\+)+");

try(Scanner s = new Scanner(file)) {
    s.findAll(LINE_WITH_CONTINUATION)
        .map(m -> m.group().replaceAll("\\R\\+", ""))
        .forEach(System.out::println);
}


Since Java 8 lacks the Scanner.findAll(Pattern) method, you may add a custom implementation of the operation as a work-around

public static Stream<MatchResult> findAll(Scanner s, Pattern pattern) {
    return StreamSupport.stream(new Spliterators.AbstractSpliterator<MatchResult>(
            1000, Spliterator.ORDERED|Spliterator.NONNULL) {
        public boolean tryAdvance(Consumer<? super MatchResult> action) {
            if(s.findWithinHorizon(pattern, 0)!=null) {
                action.accept(s.match());
                return true;
            }
            else return false;
        }
    }, false);
}

which can be used like

try(Scanner s = new Scanner(file)) {
    findAll(s, LINE_WITH_CONTINUATION)
        .map(m -> m.group().replaceAll("\\R\\+", ""))
        .forEach(System.out::println);
}

which will make the future migration easy.

Holger
  • 285,553
  • 42
  • 434
  • 765
2

Assuming that you run this sequentially only and really want to use streams:

 List<String> result = Files.lines(Paths.get("YourPath"))
            .collect(() -> new ArrayList<>(), (list, line) -> {
                int listSize = list.size();
                if (line.startsWith("+ ")) {
                    list.set(listSize - 1, list.get(listSize - 1) + line.substring(2));
                } else {
                    list.add(line);
                }
            }, (left, right) -> {
                throw new RuntimeException("Not for parallel processing");
            });
Eugene
  • 117,005
  • 15
  • 201
  • 306