I'm dealing with PySpark dataframe which has struct
type column as shown below:
df.printSchema()
#root
#|-- timeframe: struct (nullable = false)
#| |-- start: timestamp (nullable = true)
#| |-- end: timestamp (nullable = true)
So I tried to collect()
and pass end
timestamps/window of related column for plotting issue:
from pyspark.sql.functions import *
# method 1
ts1 = [val('timeframe.end') for val in df.select(date_format(col('timeframe.end'),"yyyy-MM-dd")).collect()]
# method 2
ts2 = [val('timeframe.end') for val in df.select('timeframe.end').collect()]
So normally when the column is not struct I follow this answer but in this case I couldn't find better ways except this post and this answer which they tries to convert it to arrays. I'm not sure this the best practice.
What I have tried 2 methods as shown above unsuccessfully which outputs belows:
print(ts1) #[Row(2021-12-28='timeframe.end')]
print(ts2) #[Row(2021-12-28 00:00:00='timeframe.end')]
Expected outputs are below:
print(ts1) #[2021-12-28] just date format
print(ts2) #[2021-12-28 00:00:00] just timestamp format
How can I handle this matter?