2

I have a pyspark dataframe, df

id alias
1  ["jon", "doe"]
2 null

I am trying to replace the nulls and use an empty list

id alias
1  ["jon", "doe"]
2 []

I tried using

.fillna('alias', '[]') .fillna('alias', create_list([])

and answers from Convert null values to empty array in Spark DataFrame

but none of them are syntactically correct.

Swaraj Giri
  • 4,007
  • 2
  • 27
  • 44

1 Answers1

0

Due to column types are different, you can't use fillna directly. You can use something like below

df.show()
+---+----------+
| id|     alias|
+---+----------+
|  1|[jon, doe]|
|  2|      null|
+---+----------+


import pyspark.sql.functions as F
df.select([ F.coalesce(F.col(col[0]), F.array()).alias(col[0]) if col[1].startswith('array') else F.col(col[0]) for col in df.dtypes]).show()
+---+----------+
| id|     alias|
+---+----------+
|  1|[jon, doe]|
|  2|        []|
+---+----------+
Ali Yesilli
  • 2,071
  • 13
  • 16