16

I want to skip the first line and use the second as header.

I am using classes from apache commons csv to process a CSV file.

The header of the CSV file is in the second row, not the first (which contains coordinates).

My code looks like this:

static void processFile(final File file) {
    FileReader filereader = new FileReader(file);
    final CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';');
    CSVParser parser = new CSVParser(filereader, format);
    final List<CSVRecord> records = parser.getRecords();
    //stuff
}

I naively thought,

CSVFormat format = CSVFormat.DEFAULT.withFirstRecordAsHeader().withDelimiter(;)

would solve the problem, as it's different from withFirstRowAsHeader and I thought it would detect that the first row doesn't contain any semicolons and is not a record. It doesn't. I tried to skip the first line (that CSVFormat seems to think is the header) with

CSVFormat format = CSVFormat.DEFAULT.withSkipHeaderRecord().withFirstRecordAsHeader().withDelimiter(;);

but that also doesn't work. What can I do? What's the difference between withFirstRowAsHeader and withFirstRecordAsHeader?

duffymo
  • 305,152
  • 44
  • 369
  • 561
Medusa
  • 593
  • 2
  • 5
  • 18

9 Answers9

29

The correct way to skip the first line if it is a header is by using a different CSVFormat

CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';').withFirstRecordAsHeader();

Update: June 30 2022

For 1.9+, use

CSVFormat.DEFAULT.builder()                                                                  
    .setDelimiter(';')
    .setHeader()
    .setSkipHeaderRecord(true)  // skip header
    .build();
Sully
  • 14,672
  • 5
  • 54
  • 79
  • +1 for withFirstRecordAsHeader(), I use it with CSVParser and it skips the header when you iterate over the parser. – keni Aug 21 '18 at 17:42
  • 4
    This should be the accepted answer, since it uses the library, instead of an ad-hoc pure Java solution – jmm Nov 28 '19 at 20:54
  • This should be the accepted answer. Thanks – A MJ Sep 07 '21 at 08:21
  • 1
    I think the original question was about the first *two* lines, where the second contains the header. – avandeursen Dec 11 '21 at 16:43
  • for me it currently shows withDelimiter as deprecated. – Maik Jun 29 '22 at 09:29
  • 1
    I think the setHeader() method must be call too: CSVFormat.DEFAULT.builder() .setDelimiter(';').setHeader() .setSkipHeaderRecord(true) // skip header .build(); – fdm Aug 12 '22 at 11:39
  • setHeader() will read the first record as the headers. The question says the headers are in the second record. – grigouille Apr 10 '23 at 07:27
12

You may want to read the first line, before passing the reader to the CSVParser :

static void processFile(final File file) {
    FileReader filereader = new FileReader(file);
    BufferedReader bufferedReader = new BufferedReader(filereader);
    bufferedReader.readLine();// try-catch omitted
    final CSVFormat format = CSVFormat.DEFAULT.withDelimiter(';');
    CSVParser parser = new CSVParser(bufferedReader, format);
    final List<CSVRecord> records = parser.getRecords();
    //stuff
}
Arnaud
  • 17,229
  • 3
  • 31
  • 44
  • In case of my `,` seperated csv file, I need to change `CSVFormat.DEFAULT.withDelimiter(';');` to `CSVFormat.DEFAULT.withDelimiter(',');`. Is this correct? – Suresh Jul 30 '18 at 06:00
  • readLine ? What if the first record contains "bla\r\nbli". – grigouille Apr 09 '23 at 17:19
6

In version 1.9.0 of org.apache.commons:commons-csv use:

val format = CSVFormat.Builder.create(CSVFormat.DEFAULT)
        .setHeader()
        .setSkipHeaderRecord(true)
        .build()

val parser = CSVParser.parse(reader, format)
Markus Lenger
  • 521
  • 5
  • 7
2

You can skip the first record using stream:

List<CSVRecord> noHeadersLine = records.stream.skip(1).collect(toList());
Frank Why
  • 86
  • 6
1

You can filter it using Java Streams:

parser.getRecords().stream()
     .filter(record -> record.getRecordNumber() != 1) 
     .collect(Collectors.toList());
Musab Qamri
  • 111
  • 1
  • 8
1

I am assuming your file format looks something like:

<garbage line here>
<header data>
<record data starts here>

For version 1.9.0, use, as given above, but with one addition:

Reader in = new FileReader(fileName);
BufferedReader bufferedReader = new BufferedReader(in);
System.out.println(bufferedReader.readLine());
CSVFormat format = CSVFormat.Builder.create(CSVFormat.DEFAULT)
            .setHeader()
            .setSkipHeaderRecord(true)
            .build();
CSVParser parser = CSVParser.parse(bufferedReader, format);
for (CSVRecord record : parser.getRecords()) {
    <do something>
}

If you don't skip that first line somehow, you will throw an IllegalArgumentException.

0

You could consume the first line and then pass it to the CSVParser. Other than that there is a method #withIgnoreEmptyLines which might solve the issue.

Murat Karagöz
  • 35,401
  • 16
  • 78
  • 107
  • 1
    the problem is the line isn't empty. But using BufferedReader (which has a readLine method) solved it. – Medusa Aug 24 '17 at 13:17
0

the .setHeader() method must be call for the .setSkipHeaderRecord(true) to take effect.

CSVFormat.DEFAULT.builder()                                                                  
    .setDelimiter(';')
    .setHeader()    
    .setSkipHeaderRecord(true)  // skip header
    .build();
fdm
  • 70
  • 9
0

If your first record doesn't contain CR LF characters, you can use the "readLine" method. Otherwise you have to read twice.

First get the headers :

CSVFormat format;
List<String> headers = null;
try(Reader reader = getReader()) {
  Iterator<CSVRecord> iter = format.parse(reader).iterator();
  if(iter.hasNext()) iter.next();
  if(iter.hasNext()) {
    headers = iter.next().toList();
  }
}

Then read again :

try(Reader reader = getReader()) {
  format = format.builder().setHeader(headers.toArray(new String[0])).build();
  Iterator<CSVRecord> iter = format.parse(reader).iterator();
  if(iter.hasNext()) iter.next();
  if(iter.hasNext()) iter.next();
  while(iter.hasNext()) {
    CSVRecord record = iter.next();
    //do stuff
  }
}
grigouille
  • 511
  • 3
  • 14