The spark dataframe below has start_t and end_t in unix format but has an exponential e in it.
+------+----------------+------------------+--------+----------+----------+-------+-----------+-----------+-----------+-------------+-------+---------------+----------------+
| alt_t| end_t|engine_fuel_rate_t| lat_t|left_max_t|left_min_t| lon_t|plm3_incl_t|right_max_t|right_min_t|road_class_u8|speed_t|sprung_weight_t| start_t|
+------+----------------+------------------+--------+----------+----------+-------+-----------+-----------+-----------+-------------+-------+---------------+----------------+
|1237.5|1.521956985733E9| 0|-27.7314| 0.0| 0.0|22.9552| 1.5| 0.0| 0.0| 0| 17.4| 198.0| 1.52195698056E9|
|1236.5|1.521956989922E9| 0|-27.7316| 0.0| 0.0|22.9552| -3.3| 0.0| 0.0| 0| 17.6| 156.1|1.521956985733E9|
|1234.5|1.521956995378E9| 0|-27.7318| 0.0| 0.0|22.9552| -2.7| 0.0| 0.0| 0| 11.9| 148.6|1.521956989922E9|
|1230.5|1.521957001498E9| 0| -27.732| 0.0| 0.0|22.9551| 2.3| 0.0| 0.0| 0| 13.2| 169.1|1.521956995378E9|
Since it is double it can not be convert directly to timestamp. It will through an error stating it needs to be string.
+------+----------------+------------------+--------+----------+----------+-------+-----------+-----------+-----------+-------------+-------+---------------+-------+
| alt_t| end_t|engine_fuel_rate_t| lat_t|left_max_t|left_min_t| lon_t|plm3_incl_t|right_max_t|right_min_t|road_class_u8|speed_t|sprung_weight_t|start_t|
+------+----------------+------------------+--------+----------+----------+-------+-----------+-----------+-----------+-------------+-------+---------------+-------+
|1237.5|1.521956985733E9| 0|-27.7314| 0.0| 0.0|22.9552| 1.5| 0.0| 0.0| 0| 17.4| 198.0| null|
|1236.5|1.521956989922E9| 0|-27.7316| 0.0| 0.0|22.9552| -3.3| 0.0| 0.0| 0| 17.6| 156.1| null|
|1234.5|1.521956995378E9| 0|-27.7318| 0.0| 0.0|22.9552| -2.7| 0.0| 0.0| 0| 11.9| 148.6| null|
Therefore I used the following code:
%scala
val df2 = df.withColumn("start_t", df("start_t").cast("string"))
val df3 = df2.withColumn("end_t", df("end_t").cast("string"))
val filteredDF = df3.withColumn("start_t", unix_timestamp($"start_t", "yyyyMMddHHmmss").cast("timestamp"))
filteredDF.show()
I get null in start_t and think its due to the E (exponential sign). I have tested it in pandas python, the dates are valid and outputs results. I know there is a way using precision to change this. I am trying to convert it to timestamp in the format yyyy-MM-dd HH:mm:ss and have a separate column for just the time and date.
Note: similar question was posed but not answered. Scala Spark : Convert Double Column to Date Time Column in dataframe