I am parsing a csv file having data as:
2016-10-03, 18.00.00, 2, 6
When I am reading file creating schema as below:
StructType schema = DataTypes.createStructType(Arrays.asList(
DataTypes.createStructField("Date", DataTypes.DateType, false),
DataTypes.createStructField("Time", DataTypes.TimestampType, false),
DataTypes.createStructField("CO(GT)", DataTypes.IntegerType, false),
DataTypes.createStructField("PT08.S1(CO)", DataTypes.IntegerType, false)))
Dataset<Row> df = spark.read().format("csv").schema(schema).load("src/main/resources/AirQualityUCI/sample.csv");
Its producing below error as:
Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Unknown Source)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:137)
I feel that it is due to time format error. What are the ways of converting them into specific formats or changes to be made into StructType
for its proper meaning?
The format I expect is in form of hh:mm:ss as it will be helpful via spark sql to convert it into timestamp format by concatenating columns.
2016-10-03, 18:00:00, 2, 6