0

I have a csv having data as

Date, Time, CO(GT), PT08.S1(CO)
10/03/2004, 18.00.00, 2, 6

I am parsing it for creating DataSet and would like to skip first line. How can I do it?

Utkarsh Saraf
  • 475
  • 8
  • 31

1 Answers1

0

Use the header option of the DataFrameReader, for example like this in Java:

Daset<Row> df = spark
            .read()
            .option("inferSchema", true)
            .option("header", true)
            .csv(paths);
moe
  • 1,716
  • 1
  • 14
  • 30
  • can you provide me any link on what all we can use in `option` in Spark. – Utkarsh Saraf Sep 19 '17 at 13:20
  • 1
    https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L507 – philantrovert Sep 19 '17 at 13:25
  • it depends on what datasource you are using. you can have a look at the sql guide (https://spark.apache.org/docs/latest/sql-programming-guide.html#data-sources) and api documentation (https://spark.apache.org/docs/2.1.0/api/scala/index.html#org.apache.spark.sql.DataFrameReader). the options are however not to be found under options but under the respectvie datasource method. in this case `csv` – moe Sep 19 '17 at 13:28