I'm very new to both spark and scala, and am trying to load a csv similar to:
A,09:33:57.570
B,09:43:02.577
...
The only temporal type I see in scala.sql.types is TimestampType, so I am loading the csv with:
val schema = StructType(Array( StructField("A", StringType, true), StructField("time", TimestampType, true)))
val table = spark.read.option("header","false").option("inferSchema","false").schema(schema).csv("../table.csv")
This seems to work fine until I do table.show()
or table.take(5)
, etc, in which case I get the following exception:
scala> table.show()
16/10/07 16:32:25 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.IllegalArgumentException
at java.sql.Date.valueOf(Date.java:143)
at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:137)
at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:287)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:115)
at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:84)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:125)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:124)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
Is there a preferred way of having time data stored within spark? I have also tried leaving it as a string and mapping LocalTime.parse() from java.time on each value, but that fails saying that there is no Encoder for the type.