I've seen (here: How to convert Timestamp to Date format in DataFrame?) the way to convert a timestamp in datetype, but,at least for me, it doesn't work.
Here is what I've tried:
# Create dataframe
df_test = spark.createDataFrame([('20170809',), ('20171007',)], ['date',])
# Convert to timestamp
df_test2 = df_test.withColumn('timestamp',func.when((df_test.date.isNull() | (df_test.date == '')) , '0')\
.otherwise(func.unix_timestamp(df_test.date,'yyyyMMdd')))\
# Convert timestamp to date again
df_test2.withColumn('date_again', df_test2['timestamp'].cast(stypes.DateType())).show()
But this returns null in the column date_again
:
+--------+----------+----------+
| date| timestamp|date_again|
+--------+----------+----------+
|20170809|1502229600| null|
|20171007|1507327200| null|
+--------+----------+----------+
Any idea of what's failing?