0

I have the schema like this:

root
 |-- id: string (nullable = true)
 |-- MATCH_timestamp: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- MATCH_n: integer (nullable = true)
 |-- PAYMENT_INAPP_timestamp: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- PAYMENT_INAPP_cash: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- PAYMENT_INAPP_coin: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- PAYMENT_INAPP_count: integer (nullable = true)

However, PAYMENT_INAPP_ stuff may be null, as user might not pay yet. As you see in schema, PAYMENT_INAPP_timestamp, PAYMENT_INAPP_cash, PAYMENT_INAPP_coin is an array. I would like to replace null value with an empty array.

Tried, but not working:

myDf.na.fill(Array.empty[Int], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))

myDf.na.fill(Array.empty[Long], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))

myDf.na.fill(Array.empty[String], Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))

or

myDf.na.fill(lit(Array.empty[Int]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))

myDf.na.fill(lit(Array.empty[String]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))

myDf.na.fill(lit(Array.empty[Long]), Seq("PAYMENT_INAPP_timestamp", "PAYMENT_INAPP_cash", "PAYMENT_INAPP_coin"))
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
  • 1
    Possible duplicate of [Convert null values to empty array in Spark DataFrame](https://stackoverflow.com/questions/34660867/convert-null-values-to-empty-array-in-spark-dataframe). Specifically, something like `coalesce($"column", array())` should work. – Shaido Aug 19 '19 at 07:24

0 Answers0