-1

I've written a scala function which will convert time(HH:mm:ss.SSS) to seconds. First it will ignore milliseconds and will take only (HH:mm:ss) and convert into seconds(int). It works fine when testing in spark-shell.

def hoursToSeconds(a: Any): Int = {
 val sec = a.toString.split('.')
 val fields = sec(0).split(':')
 val creationSeconds = fields(0).toInt*3600 + fields(1).toInt*60 + fields(2).toInt
 return creationSeconds
}

print(hoursToSeconds("03:51:21.2550000"))
13881

I would need to pass this function to one of the dataframe column(running), which i was trying with the withColumn method, but getting error Type mismatch, expected: column, actual String. Any help would be appreciated, is there a way i can pass the scala function to udf and then use udf in df.withColumn.

df.printSchema
root
 |-- vin: string (nullable = true)
 |-- BeginOfDay: string (nullable = true)
 |-- Timezone: string (nullable = true)
 |-- Version: timestamp (nullable = true)
 |-- Running: string (nullable = true)
 |-- Idling: string (nullable = true)
 |-- Stopped: string (nullable = true)
 |-- dlLoadDate: string (nullable = false)

sample running column values.

enter image description here

df.withColumn("running", hoursToSeconds(df("Running")
chaitra k
  • 371
  • 1
  • 4
  • 18

1 Answers1

1

You can create a udf for the hoursToSeconds function by using the following sytax :

val hoursToSecUdf = udf(hoursToSeconds _)

Further to use it on a particular column the following sytax can be used :

df.withColumn("TimeInSeconds",hoursToSecUdf(col("running")))
  • can you help and suggest how to handle this https://stackoverflow.com/questions/62036791/while-writing-to-hdfs-path-getting-error-java-io-ioexception-failed-to-rename – BdEngineer May 27 '20 at 06:49