I've written a scala function which will convert time(HH:mm:ss.SSS) to seconds. First it will ignore milliseconds and will take only (HH:mm:ss) and convert into seconds(int). It works fine when testing in spark-shell.
def hoursToSeconds(a: Any): Int = {
val sec = a.toString.split('.')
val fields = sec(0).split(':')
val creationSeconds = fields(0).toInt*3600 + fields(1).toInt*60 + fields(2).toInt
return creationSeconds
}
print(hoursToSeconds("03:51:21.2550000"))
13881
I would need to pass this function to one of the dataframe column(running), which i was trying with the withColumn method, but getting error Type mismatch, expected: column, actual String. Any help would be appreciated, is there a way i can pass the scala function to udf and then use udf in df.withColumn.
df.printSchema
root
|-- vin: string (nullable = true)
|-- BeginOfDay: string (nullable = true)
|-- Timezone: string (nullable = true)
|-- Version: timestamp (nullable = true)
|-- Running: string (nullable = true)
|-- Idling: string (nullable = true)
|-- Stopped: string (nullable = true)
|-- dlLoadDate: string (nullable = false)
sample running column values.
df.withColumn("running", hoursToSeconds(df("Running")