6

I have a CSV file and the first line contains the headers. So I thought it would be perfect to use Java 8 streams.

    try (Stream<String> stream = Files.lines(csv_file) ){
        stream.skip(1).forEach( line -> handleLine(line) );
    } catch ( IOException ioe ){
        handleError(ioe);
    }

Is it possible to take the first element, analyze it and then call the forEach method? Something like

stream
      .forFirst( line -> handleFirst(line) )
      .skip(1)
      .forEach( line -> handleLine(line) );

ADDITIONALLY: My CSV file contains around 1k lines and I can handle each line parallel to speed it up. Except the first line. I need the first line to initiallize other objects in my project :/ So maybe it is fast to open a BufferedReader, read the first line, close the BufferedReader and than use parallel streams?

Alexis C.
  • 91,686
  • 21
  • 171
  • 177
Highchiller
  • 194
  • 2
  • 11
  • What's your goal? – shmosel Nov 30 '16 at 21:03
  • 1
    I guess it is a header of a file and needs special treatment, right? – NiVeR Nov 30 '16 at 21:04
  • You should use the stream's iterator. Sometimes the old ways are the best. You can call `next()` to get the iterator's first item, then `forEachRemaining()`. – Sam Nov 30 '16 at 21:08
  • Replace your psuedo `forFirst` with `findFirst().get()`, work with that, then continue with `skip` and `forEach`. – Zircon Nov 30 '16 at 21:09
  • 3
    @Zircon `findFirst()` is a terminal operation. You can't use the stream afterwards. – shmosel Nov 30 '16 at 21:10
  • 2
    Sometimes, **imperative problems** are better written as i perative code rather than abusing looks-like-functional programming. – Has QUIT--Anony-Mousse Nov 30 '16 at 21:30
  • I need the first line to setup another class (because the first line contains the headers of the following lines). My CSV file has about 1k lines. I'm able to handle all lines parallel as well. That's why I thought it is a good idea to work with streams. – Highchiller Nov 30 '16 at 21:39

3 Answers3

8

In general, you can use iterators to do this:

Stream<Item> stream = ... //initialize your stream
Iterator<Item> i = stream.iterator();
handleFirst(i.next());
i.forEachRemaining(item -> handleRest(item));

In your program, it would look something like this:

try (Stream<String> stream = Files.lines(csv_file)){
    Iterator<String> i = stream.iterator();
    handleFirst(i.next());
    i.forEachRemaining(s -> handleRest(s));
}

You may want to add some error checking in case you get 1 or 0 lines, but this should work.

CodeBlind
  • 4,519
  • 1
  • 24
  • 36
  • 3
    The stream returned by `Files.lines` should be closed to ensure that the inner buffered reader is closed. The terminal `iterator()` doesn't implictly close the stream, see http://stackoverflow.com/questions/34072035/why-is-files-lines-and-similar-streams-not-automatically-closed. – Tunaki Dec 01 '16 at 00:05
4

A nice way to do that would be to get a BufferedReader reading your file, for example with the help of Files.newBufferedReader(path). Then you can call nextLine() one time to retrieve the header row, and lines() to get a Stream<String> of all the other rows:

try (BufferedReader br = Files.newBufferedReader(csv_file)){
    String header = br.readLine();
    // if header is null, the file was empty, you may want to throw an exception
    br.lines().forEach(line -> handleLine(line));
}

This works because the first call to readLine() will cause the buffered reader to read the first line, so subsequently, since lines() is a stream populated by reading the lines, it starts reading at the second line. The buffered reader is also correctly closed by the try-with-resources when the processing ends.

Potentially, the stream pipeline could be run in parallel, but for I/O-bound tasks like this one, I wouldn't expect any performance improvement, unless it is the processing of each row that is the slower part. But be careful with the forEach in this case: it will be ran concurrently and so its code needs to be thread-safe. It's unclear what the handleLine method does, but, generally, you do not need forEach and might prefer a mutable reduction with collect, which would be safe to use in a parallel stream.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
0

I don't think there's good way of doing it within the stream pipeline, but you can use the stream's iterator for finer control over the iteration:

try (Stream<String> stream = Files.lines(csv_file) ){
    Iterator<String> iter = stream.iterator();
    if (iter.hasNext()) {
        handleFirst(iter.next());
        while (iter.hasNext()) {
            handleLine(iter.next());
        }
    }
} catch ( IOException ioe ){
    handleError(ioe);
}
shmosel
  • 49,289
  • 6
  • 73
  • 138