I have the schema like this:
root
|-- id: string (nullable = true)
|-- MATCH_timestamp: array (nullable = true)
| |-- element: long (containsNull = true)
|-- MATCH_n: integer (nullable = true)
|-- PAYMENT_INAPP_timestamp: array (nullable = true)
| |-- element: long (containsNull = true)
|-- PAYMENT_INAPP_cash: array (nullable = true)
| |-- element: string (containsNull = true)
|-- PAYMENT_INAPP_coin: array (nullable = true)
| |-- element: string (containsNull = true)
|-- PAYMENT_INAPP_count: integer (nullable = true)
However, PAYMENT_INAPP_ stuff may be null, as user might not pay yet. As you see in schema, PAYMENT_INAPP_timestamp, PAYMENT_INAPP_cash, PAYMENT_INAPP_coin is an array. I would like to replace null value with an empty array.
Tried, but not working:
myDf.na.fill(Array.empty[Int], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
myDf.na.fill(Array.empty[Long], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
myDf.na.fill(Array.empty[String], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
or
myDf.na.fill(lit(Array.empty[Int]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
myDf.na.fill(lit(Array.empty[String]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
myDf.na.fill(lit(Array.empty[Long]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))