3

I want to remove lines from a CSV file which contain the wrong date. In the process the CSV file should retain the header line. All this I want to perform using java 8 streams.

At first I cam up with this:

try (Stream<String> linesUnfiltered = Files.lines(f.toPath(), StandardCharsets.UTF_8)) {
    Stream<String> firstLine = linesUnfiltered.limit(1);
    Stream<String> linesFiltered = linesUnfiltered
            .filter(e -> e.contains(sdfFileContent.format(fileDate)));
    Stream<String> result = Stream.concat(firstLine, linesFiltered);
    Files.write(f.toPath(), (Iterable<String>) result::iterator);
}

But this throws the exception java.lang.IllegalStateException: stream has already been operated upon or closed because linesUnfiltered is reused. The suggestion on the web is to use a Supplier<Stream<String>>, but my understanding is that the supplier would read the file for each supplier.get() call, which is not very efficient.

And thats why I am asking if there is another way which is more efficient that this? I am pretty certain that it should be possible to perform the two operations on the same stream...

EDIT:

It is NOT a duplicate as the first item should not be removed. It should only be excluded from the filtering process but still be available in the result stream

Naman
  • 27,789
  • 26
  • 218
  • 353
XtremeBaumer
  • 6,275
  • 3
  • 19
  • 65
  • use skip(1) operator on stream. – Ravindra Ranwala Feb 07 '20 at 10:42
  • But `skip(1)` removes the first line form the result as well which I don't want – XtremeBaumer Feb 07 '20 at 10:43
  • You basically want to do a stateful operation, which is something that clashes with wanting to use streams. You want special treatment for the first element, which means you'll have to resort to some kind of hack (they're ugly, like keeping a mutable boolean to check whether you're dealing with the first line or not). – Kayaman Feb 07 '20 at 10:47
  • What about splitting the stream into 2 new streams where one only contains the first line and the second contains everything else? – XtremeBaumer Feb 07 '20 at 10:55
  • 1
    @XtremeBaumer you can't really [split streams](https://stackoverflow.com/questions/19940319/can-you-split-a-stream-into-two-streams) – Kayaman Feb 07 '20 at 10:57

2 Answers2

6

You can use a reader and call its readLine method to consume the header, then filter on the result of lines() (after consuming the first line from the same reader):

try (BufferedReader reader = Files.newBufferedReader(f.toPath(), 
                                  StandardCharsets.UTF_8)) {

    Stream<String> firstLine = Stream.of(reader.readLine());
    Stream<String> linesFiltered = reader.lines()
            .filter(e -> e.contains(sdfFileContent.format(fileDate)));
    Stream<String> result = Stream.concat(firstLine, linesFiltered);

    ...
ernest_k
  • 44,416
  • 5
  • 53
  • 99
1

You can convert the Stream to an Iterator, take the first element, then convert back.

try (Stream<String> linesUnfiltered = Files.lines(f.toPath(), StandardCharsets.UTF_8)) {
    Iterator<String> it = linesUnfiltered.iterator();
    String firstLine = it.next();
    Stream<String> otherLines = StreamSupport.stream(Spliterators.spliteratorUnknownSize(it, 0), false);
    Stream<String> linesFiltered = otherLines
            .filter(e -> e.contains(sdfFileContent.format(fileDate)));
    Stream<String> result = Stream.concat(Stream.of(firstLine), linesFiltered);
    Files.write(f.toPath(), (Iterable<String>) result::iterator);
}
MikeFHay
  • 8,562
  • 4
  • 31
  • 52