I have a column in a SparkDataFrame containing timestamps in the following way:
Start_1
<chr>
2016/01/01 10:51:15.304
2016/01/01 10:51:15.352
I let Spark infer the schema when reading the file, which yields chr as data type. I know that it would work without the millseconds, yielding the proper data type and column. However I need the milliseconds as well and wanted therefore to change the datatype to timestamp within the existing Spark Data Frame.
Here are the ways I have tested:
as POSIXct would work in base R.
dataloan_time$start_ts <- as.POSIXct(dataloan$Start_1, format = "%Y/%m/%d %H:%M:%OS")
- doesn't work (doesn't know to change the class).
A solution mentioned here and on other sites mentions casting:
dataloan_time <- withColumn(dataloan_time, "complete_ts", cast(dataloan$Complete_1, "timestamp"))
For me, it casts the data type correctly but the new column doesn't contain any data.
Here's the result for head(col)
start_ts
<lgl>
NA
NA
Collecting the data frame and changing it would be the last option I found, but I'd like to avoid that and do it within the Spark Data frame. What other solutions are there? Ideally, it would be like the first try (as POSIXct).