Working with apache-spark, I have these variables with a strange format called dttm
displayed as follows:
tpep_pickup_datetime tpep_dropoff_datetime
<dttm> <dttm>
2015-01-15 18:05:39 2015-01-15 18:23:42
2015-01-10 19:33:38 2015-01-10 19:53:28
2015-01-10 19:33:38 2015-01-10 19:43:41
2015-01-10 19:33:39 2015-01-10 19:35:31
I would like to calculate the time difference in terms of seconds between tpep_pickup_datetime
and tpep_dropoff_datetime
.
But using lubridate
package it doesn't work. How can I transform these variables into a POSIXCT
format using dplyr
?
When I use the following code:
my_df %>%
mutate(diff_time = difftime(tpep_dropoff_datetime,tpep_pickup_datetime,units = "secs"))
I get this error:
org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'AS' expecting {')', ','}(line 1, pos 121)