2

How can i change the string type to datetime type on my elements of nested array (transaction_date)? Here are the spark dataframe that i have :

root
 |-- id
 |-- data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- transaction: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- timestamp: string (nullable = true)
 |    |    |    |    |-- transaction_date: string (nullable = true)

I had trying using this code, but it return error:

df = df.withColumn("transaction_date", df.data.transaction.transaction_date.cast(TimestampType()))
Radityo Tody
  • 141
  • 1
  • 1
  • 6
  • 1
    You need to add the error. – Vijay Krishna Mar 12 '19 at 19:36
  • Possible duplicate of [PySpark convert struct field inside array to string](https://stackoverflow.com/questions/54343635/pyspark-convert-struct-field-inside-array-to-string) – cronoik Mar 12 '19 at 21:26
  • I think the error in this case has more to do with misunderstanding how we access arrays in a dataframe, which I just observed in a slightly newer question: [In PySpark how to parse an embedded JSON](https://stackoverflow.com/a/55132658/6312602) – Jesse Amano Mar 13 '19 at 00:49

0 Answers0