I have a Dataframe that I am trying to flatten. As part of the process, I want to explode it, so if I have a column of arrays, each value of the array will be used to create a separate row.I know i can use explode function. But, i have a problem, the column contains null value and i use spark 1.6. Here is a example of the type of data and what i want :
My data :
id | ListOfRficAction| RficActionAttachment
_______________________________
1 | Luke | [baseball, soccer]
2 | Lucy | null
and i want
id | ListOfRficAction| RficActionAttachment
_______________________________
1 | Luke | baseball
1 | Luke | soccer
2 | Lucy | null
I am using Spark 1.6 (so i can't use explode_outer function) and i try to use explode but i have the following error :
scala.MatchError: [null] (of class org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema)
i also try :
df.withColumn("likes", explode(
when(col("likes").isNotNull, col("likes"))
// If null explode an array<string> with a single null
.otherwise(array(lit(null).cast("string")))))
but my DataFrame schema is a lot complex (i have string and long), so cast function doesn't work. Here is a part of my schema and the error i have :
|-- RficActionAttachment: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- ActivityFileAutoUpdFlg: string (nullable = true)
| | |-- ActivityFileDate: string (nullable = true)
| | |-- ActivityFileDeferFlg: string (nullable = true)
| | |-- ActivityFileDockReqFlg: string (nullable = true)
| | |-- ActivityFileDockStatFlg: string (nullable = true)
| | |-- ActivityFileExt: string (nullable = true)
| | |-- ActivityFileName: string (nullable = true)
| | |-- ActivityFileRev: string (nullable = true)
| | |-- ActivityFileSize: long (nullable = true)
| | |-- ActivityFileSrcPath: string (nullable = true)
| | |-- ActivityFileSrcType: string (nullable = true)
| | |-- ActivityId: string (nullable = true)
| | |-- AttachmentId: string (nullable = true)
| | |-- Comment: string (nullable = true)
User class threw exception:
org.apache.spark.sql.AnalysisException: cannot resolve 'CASE WHEN isnotnull(ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment) THEN ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment ELSE array(ListOfRficAction.RficAction.ListOfRficActionAttachment.RficActionAttachment)'
due to data type mismatch: THEN and ELSE expressions should all be same type or coercible to a common type;
Have idea of what i can do ?