Spliterator skipping portions of text

Question

I am facing a problem with streams' dropWhile or takeWhile methods due to which spliterator is skipping portions of text in a specific pattern odd or even. What should be done to process all portions of text? My methods here:

void read(Path filePath) {
    try {
        Stream<String> lines = Files.lines(filePath);
        while (true) {
            Spliterator<String> spliterator = lines.dropWhile(line -> !line.startsWith("FAYSAL:")).spliterator();
            Stream<String> portion = fetchNextPortion(spliterator);
            if(spliterator.estimateSize() == 0)
                break;
            portion .forEach(System.out::println);
            lines = StreamSupport.stream(spliterator, false);
        }
        lines.close();
    }
    catch (IOException e) {
        e.printStackTrace();
    }
}

private Stream<String> fetchNextPortion(Spliterator<String> spliterator) {
    return StreamSupport.stream(spliterator, false)
            .filter(this::isValidReportName)
            .peek(System.out::println)
            .findFirst()
            .map( first -> Stream.concat(Stream.of(first),
                    StreamSupport.stream(spliterator, false).takeWhile(line -> !line.startsWith("FAYSAL:")))).orElse(Stream.empty());
}

Sample input is:

FAYSAL: 1
Some text here
Some text here
FAYSAL: 2
Some text here
Some text here
FAYSAL: 3
Some text here
Some text here
FAYSAL: 4
Some text here
Some text here

It will skip FAYSAL: 2 and FAYSAL: 4

Both `dropWhile` and `takeWhile` read like they can result in same strange behaviour, so why not use `filter`? — Tom, Aug 05 '19 at 20:26
@Tom `filter` will create a new problem. It will filter every line and will skip in-between lines between two tags. — malware, Aug 06 '19 at 03:45
There is no guaranty that you can reuse a `Spliterator` after a processing a Stream based on it. Most notably, the line rejected by the `filter` unavoidably has been consumed already. But in principle, any number of subsequent elements could have been consumed already. You should describe, what you actually want to do (see also [What is the XY problem?](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378)). But it looks like a variant of [this](https://stackoverflow.com/a/57361379/2711488) In short: if you want to process multi-line artifacts, use `Scanner`. — Holger, Aug 06 '19 at 07:34
@Holger The problem is `dropWhile` and `takeWhile` are skipping `FAYSAL: 2 Some text here Some text here`. and 'FAYSAL: 4 Some text here Some text here' and so on. The suggested link is quite relevant but I want to solve the current behaviour of these APIs. — malware, Aug 06 '19 at 12:50
I already explained it, you are constructing multiple streams off a single spliterator, which has no guaranteed behavior at all. So with this approach, it is unsolvable. I also explained the behavior of the current implementation. With `takeWhile(line -> !line.startsWith("FAYSAL:"))`, you are already consuming the next line starting with `"FAYSAL:"`. The stream will stop processing as requested, but it had to fetch the line from the spliterator, to find out that it doesn't match. So the next stream constructed from the same spliterator can't see that line. — Holger, Aug 06 '19 at 14:07
If I understand your problem correctly, you might want to have a look at [my answer to a similar question](https://stackoverflow.com/a/49551622/7653073). That one was looking for a way to cut a stream into chunks, too. — Malte Hartwig, Aug 07 '19 at 01:10

geisterbot007 · Answer 1 · 2019-08-06T06:41:08.280

What should be done to process all portions of text?

You could choose a different approach.

Your code produced a StackOverflowError on my machine (also there is a call to fetchNextChunk but a method called fetchNextPartition, so I wasn't sure about that either) after displaying your problem, so instead of trying to debug it, I came up with a different way of splitting the input. Given that my approach contains the whole String in memory, it might not be suitable for larger files. I might work out a version with Streams later.

Base assumption: You want to split your input text into portions, each portion starting with a string that starts with "FAYSAL:".

The idea is similar to your approach but not based on Spliterators and it doesn't use dropWhile either. Instead it finds the first string starting with "FAYSAL:" (I assumed that that was what isValidReportName did; the code for the method wasn't in the question) and takes everything just up to the next portion start. Adding the found first element as first element of the list, the collection is then added to a list that can be later used. The amount of lines collected is then removed from the original list.

Full code:

import java.util.*;
import java.util.stream.Collectors;

class Main {

    public static void main(String[] args) {
        Main m = new Main();
        System.out.println(m.partitionTextByStringStart(m.getString()));
    }

    private List<List<String>> partitionTextByStringStart(String text) {
        List<List<String>> partitions = new ArrayList<>();
        List<String> lines = Arrays.asList(text.split("\n"));

        while (!lines.isEmpty()) {
            String first = lines.stream().filter(this::isValidReportName).findFirst().orElse("This is prolly bad");
            List<String> part = lines.stream().skip(1).takeWhile(l -> !l.startsWith("FAYSAL:")).collect(Collectors.toList());
            part.add(0, first);

            partitions.add(part);
            lines = lines.subList(part.size(), lines.size());
        }

        return partitions;
    }

    private boolean isValidReportName(String x) {
        return x.startsWith("FAYSAL:");
    }

    private String getString() {
        return "FAYSAL: 1\n" +
                "Some text here1\n" +
                "Some text here1\n" +
                "FAYSAL: 2\n" +
                "Some text here2\n" +
                "Some text here2\n" +
                "FAYSAL: 3\n" +
                "Some text here3\n" +
                "Some text here3\n" +
                "FAYSAL: 4\n" +
                "Some text here4\n" +
                "Some text here4";
    }

}

(Note: I used a static string here instead of file reading to make a full code example; you can adapt your code accordingly)

EDIT: After some research I found that grouping the things in a stream is surprisingly easy with a library called StreamEx (Github) (Maven). In this answer I found a note about the StreamEx#groupRuns function which does exactly that:

private Stream<Stream<String>> partitionStreamByStringStart(Stream<String> lineStream) {
    return StreamEx.of(lineStream).groupRuns((l1, l2) -> !l2.startsWith("FAYSAL:")).map(Collection::stream);
}

To see it working, you can add

System.out.println(m.partitionStreamByStringStart(m.getStream()).map(
    s -> s.collect(Collectors.toList())
).collect(Collectors.toList()));

to the main function and

private Stream<String> getStream() {
    return Stream.of(getString().split("\n"));
}

somewhere in the Main class of the above full code example.

Spliterator skipping portions of text

1 Answers1