Skip first line in csv while creating DataSet

Question

I have a csv having data as

Date, Time, CO(GT), PT08.S1(CO)
10/03/2004, 18.00.00, 2, 6

I am parsing it for creating DataSet and would like to skip first line. How can I do it?

score 0 · Accepted Answer · answered Sep 19 '17 at 13:14

0

Use the header option of the DataFrameReader, for example like this in Java:

Daset<Row> df = spark
            .read()
            .option("inferSchema", true)
            .option("header", true)
            .csv(paths);

answered Sep 19 '17 at 13:14

moe

can you provide me any link on what all we can use in `option` in Spark. – Utkarsh Saraf Sep 19 '17 at 13:20
1

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L507 – philantrovert Sep 19 '17 at 13:25
it depends on what datasource you are using. you can have a look at the sql guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#data-sources) and api documentation (https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameReader). the options are however not to be found under options but under the respectvie datasource method. in this case `csv` – moe Sep 19 '17 at 13:28

1 Answers1