7

Java 8 has a way to create a Stream from lines of a file. In this case, foreach will step through lines. I have a text file with following format..

bunch of lines with text
$$$$
bunch of lines with text
$$$$

I need to get each set of lines that goes before $$$$ into a single element in the Stream.

In other words, I need a Stream of Strings. Each string contains the content that goes before $$$$.

What is the best way (with minimum overhead) to do this?

lochi
  • 860
  • 2
  • 12
  • 26
  • Take a look at this question: http://stackoverflow.com/questions/32290278/picking-elements-of-a-list-until-condition-is-met-with-java-8-lambdas or also this one: http://stackoverflow.com/questions/20746429/limit-a-stream-by-a-predicate – Michael Lang Oct 10 '16 at 07:00
  • It does not answer my question.. – lochi Oct 10 '16 at 07:21
  • Does is have to use Streams? – F. Lumnitz Oct 10 '16 at 07:56
  • Yes. There is a way to do this by creating a spliterator from an iterator. I want to avoid that. – lochi Oct 10 '16 at 07:59
  • you need to create a custom `predicate` –  Oct 10 '16 at 08:53
  • You could probably customize [Maurice Naftalin's `LineSpliterator`](https://github.com/mauricen/masteringlambdas/blob/252e193a39eb2d1338158f824dc14d1daace70f9/src/main/java/org/masteringlambdas/ch5/LineSpliterator.java) to split on `\n$$$$\n`. Note that this [inspired the Java 9 implementation of `Files.lines()`](https://bugs.openjdk.java.net/browse/JDK-8072773). I thought I had seen it in on SO but could not find it. – Didier L Oct 10 '16 at 13:24
  • Looks similar to [this one](http://stackoverflow.com/a/26465398/2711488) – Holger Oct 10 '16 at 16:26

5 Answers5

2

I couldn't come up with a solution that processes the lines lazily. I'm not sure if this is possible.

My solution produces an ArrayList. If you have to use a Stream, simply call stream() on it.

public class DelimitedFile {
    public static void main(String[] args) throws IOException {
        List<String> lines = lines(Paths.get("delimited.txt"), "$$$$");
        for (int i = 0; i < lines.size(); i++) {
            System.out.printf("%d:%n%s%n", i, lines.get(i));
        }
    }

    public static List<String> lines(Path path, String delimiter) throws IOException {
        return Files.lines(path)
                .collect(ArrayList::new, new BiConsumer<ArrayList<String>, String>() {
                    boolean add = true;

                    @Override
                    public void accept(ArrayList<String> lines, String line) {
                        if (delimiter.equals(line)) {
                            add = true;
                        } else {
                            if (add) {
                                lines.add(line);
                                add = false;
                            } else {
                                int i = lines.size() - 1;
                                lines.set(i, lines.get(i) + '\n' + line);
                            }
                        }
                    }
                }, ArrayList::addAll);
    }
}

File content:

bunch of lines with text
bunch of lines with text2
bunch of lines with text3
$$$$
2bunch of lines with text
2bunch of lines with text2
$$$$
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4
$$$$

Output:

0:
bunch of lines with text
bunch of lines with text2
bunch of lines with text3
1:
2bunch of lines with text
2bunch of lines with text2
2:
3bunch of lines with text
3bunch of lines with text2
3bunch of lines with text3
3bunch of lines with text4

Edit:

I've finally come up with a solution which lazily generates the Stream:

public static Stream<String> lines(Path path, String delimiter) throws IOException {
    Stream<String> lines = Files.lines(path);
    Iterator<String> iterator = lines.iterator();
    return StreamSupport.stream(Spliterators.spliteratorUnknownSize(new Iterator<String>() {
        String nextLine;

        @Override
        public boolean hasNext() {
            if (nextLine != null) {
                return true;
            }
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (!delimiter.equals(line)) {
                    nextLine = line;
                    return true;
                }
            }
            lines.close();
            return false;
        }

        @Override
        public String next() {
            if (!hasNext()) {
                throw new NoSuchElementException();
            }
            StringBuilder sb = new StringBuilder(nextLine);
            nextLine = null;
            while (iterator.hasNext()) {
                String line = iterator.next();
                if (delimiter.equals(line)) {
                    break;
                }
                sb.append('\n').append(line);
            }
            return sb.toString();
        }
    }, Spliterator.ORDERED | Spliterator.NONNULL | Spliterator.IMMUTABLE), false);
}

This is actually/coincidentally very similar to the implementation of BufferedReader.lines() (which is internally used by Files.lines(Path)). It may be less overhead not to use both of these methods but instead use Files.newBufferedReader(Path) and BufferedReader.readLine() directly.

xehpuk
  • 7,814
  • 3
  • 30
  • 54
  • This works. This is similar to what I mentioned in my fourth comment under the question. Can you please delete the ArrayList based answer and include the best performant version of your second code so that I can accept your answer. – lochi Oct 10 '16 at 14:25
1

You can use a Scanner as an iterator and create the stream from it:

private static Stream<String> recordStreamOf(Readable source) {
    Scanner scanner = new Scanner(source);
    scanner.useDelimiter("$$$$");
    return StreamSupport
        .stream(Spliterators.spliteratorUnknownSize(scanner, Spliterator.ORDERED | Spliterator.NONNULL), false)
        .onClose(scanner::close);
}

This will preserve the newlines in the chunks for further filtering or splitting.

ArtGod
  • 31
  • 5
0

You could try

    List<String> list = new ArrayList<>();
    try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
            list = stream
                .filter(line -> !line.equals("$$$$"))
                .collect(Collectors.toList());
    } catch (IOException e) {
        e.printStackTrace();
    }
Isukthar
  • 185
  • 8
  • 1
    That does not combine the lines between the "$$$$" lines to a single element. Rather it removes these delimeters, leaving you clueless afterwards. – f1sh Oct 10 '16 at 07:48
  • I realised it afterwards but I'm not able to remove my answer. You can concatenate the lines and split with $$$$. – Isukthar Oct 10 '16 at 07:49
  • @Isukthar you should be able to remove it by using the link at the bottom left of your answer – Didier L Oct 10 '16 at 12:55
  • I got the message "An error has occurred - please retry your request." when i click on delete. – Isukthar Oct 10 '16 at 12:59
  • It might be a temporary issue, otherwise don't hesitate to ask for help on [meta]. – Didier L Oct 10 '16 at 13:32
0

There already exists a similar shorter answer, but type.safe is the following, without extra state:

    Path path = Paths.get("... .txt");
    try {
        List<StringBuilder> glist = Files.lines(path, StandardCharsets.UTF_8)
                .collect(() -> new ArrayList<StringBuilder>(),
                        (list, line) -> {
                            if (list.isEmpty() || list.get(list.size() - 1).toString().endsWith("$$$$\n")) {
                                list.add(new StringBuilder());
                            }
                            list.get(list.size() - 1).append(line).append('\n');
                        },
                        (list1, list2) -> {
                            if (!list1.isEmpty() && !list1.get(list1.size() - 1).toString().endsWith("$$$$\n")
                                    && !list2.isEmpty()) {
                                // Merge last of list1 and first of list2:
                                list1.get(list1.size() - 1).append(list2.remove(0).toString());
                            }
                            list1.addAll(list2);
                        });
        glist.forEach(sb -> System.out.printf("------------------%n%s%n", sb));
    } catch (IOException ex) {
        Logger.getLogger(App.class.getName()).log(Level.SEVERE, null, ex);
    }

Instead of .endsWith("$$$$\n") it would be better to do:

.matches("(^|\n)\\$\\$\\$\\$\n")
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

Here a solution based on this previous work:

public class ChunkSpliterator extends Spliterators.AbstractSpliterator<List<String>> {
    private final Spliterator<String> source;
    private final Predicate<String> delimiter;
    private final Consumer<String> getChunk;
    private List<String> current;

    ChunkSpliterator(Spliterator<String> lineSpliterator, Predicate<String> mark) {
        super(lineSpliterator.estimateSize(), ORDERED|NONNULL);
        source=lineSpliterator;
        delimiter=mark;
        getChunk=s -> {
            if(current==null) current=new ArrayList<>();
            current.add(s);
        };
    }
    public boolean tryAdvance(Consumer<? super List<String>> action) {
        while(current==null || !delimiter.test(current.get(current.size()-1)))
            if(!source.tryAdvance(getChunk)) return lastChunk(action);
        current.remove(current.size()-1);
        action.accept(current);
        current=null;
        return true;
    }
    private boolean lastChunk(Consumer<? super List<String>> action) {
        if(current==null) return false;
        action.accept(current);
        current=null;
        return true;
    }

    public static Stream<List<String>> toChunks(
        Stream<String> lines, Predicate<String> splitAt, boolean parallel) {
        return StreamSupport.stream(
            new ChunkSpliterator(lines.spliterator(), splitAt),
            parallel);
    }
}

which you can use like

try(Stream<String> lines=Files.lines(pathToYourFile)) {
    ChunkSpliterator.toChunks(
        lines,
        Pattern.compile("^\\Q$$$$\\E$").asPredicate(),
        false)
    /* chain your stream operations, e.g.
    .forEach(s -> { s.forEach(System.out::print); System.out.println(); })
     */;
}
Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765