I have a data frame with date/time column storing data as binary data type. I need to convert it the actual datetime data type so I can run SQL window functions and more. Hence, looking for some working examples.
Input dataframe schema: root |-- ce_time: binary (nullable = true)
Sample data:
+-------------------------------------------------------------------------------------+
|ce_time |
+-------------------------------------------------------------------------------------+
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 33 39 2E 32 30 34 36 37 38 35 5A]|
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 34 36 2E 38 32 33 32 34 32 5A] |
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 35 34 2E 34 35 39 30 34 33 37 5A]|
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 35 3A 30 32 2E 35 37 30 38 35 39 36 5A]|
+-------------------------------------------------------------------------------------+
I can convert the above to string, it looks like this, but I need it in date/time type, not string.
+-------------------------------------------------------------------------------------+----------------------------+
|ce_time |ce_time_string |
+-------------------------------------------------------------------------------------+----------------------------+
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 33 39 2E 32 30 34 36 37 38 35 5A]|2022-05-02T00:04:39.2046785Z|
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 34 36 2E 38 32 33 32 34 32 5A] |2022-05-02T00:04:46.823242Z |
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 34 3A 35 34 2E 34 35 39 30 34 33 37 5A]|2022-05-02T00:04:54.4590437Z|
|[32 30 32 32 2D 30 35 2D 30 32 54 30 30 3A 30 35 3A 30 32 2E 35 37 30 38 35 39 36 5A]|2022-05-02T00:05:02.5708596Z|
+-------------------------------------------------------------------------------------+----------------------------+
If someone knows how to convert binary to date/time in PySpark and keep the above date/time values exactly the same, please share!
Much appreciated!!