I have a dataframe made up of following data
val df = List(
(1,"wwe",List(1,2,3)),
(2,"dsad",List.empty),
(3,"dfd",null)).toDF("id","name","value")
df.show
+---+----+---------+
| id|name| value|
+---+----+---------+
| 1| wwe|[1, 2, 3]|
| 2|dsad| []|
| 3| dfd| null|
+---+----+---------+
inorder to explode array column values I used the following logic
def explodeWithNull(f:StructField): Column ={
explode(
when(
col(f.name).isNotNull, col(f.name)
).otherwise(
f.dataType.asInstanceOf[ArrayType].elementType match{
case StringType => array(lit(""))
case DoubleType => array(lit(0.0))
case IntegerType => array(lit(0))
case _ => array(lit(""))
}
)
)
}
def explodeAllArraysColumns(dataframe: DataFrame): DataFrame = {
val schema: StructType = dataframe.schema
val arrayFileds: Seq[StructField] = schema.filter(f => f.dataType.typeName == "array")
arrayFileds.foldLeft(dataframe) {
(df: DataFrame, f: StructField) => df.withColumn(f.name,explodeWithNull(f))
}
}
explodeAllArraysColumns(df).show
+---+----+-----+
| id|name|value|
+---+----+-----+
| 1| wwe| 1|
| 1| wwe| 2|
| 1| wwe| 3|
| 3| dfd| 0|
+---+----+-----+
exploding this way I'm missing out the row which is an empty array in df. Ideally I don't want to miss that row,I either want a null or a default value for that column in the exploded dataframe.How to achieve this?