0

I have a schema like this

root
 |-- CaseNumber: string (nullable = true)
 |-- Interactions: struct (nullable = true)
 |    |-- EmailInteractions: array (nullable = true)
 |    |    |-- element: string (containsNull = true)
 |    |-- PhoneInteractions: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- CreatedOn: string (nullable = true)
 |    |    |    |-- Direction: string (nullable = true)
 |    |-- WebInteractions: array (nullable = true)
 |    |    |-- element: string (containsNull = true)

How can I make it like this

root
 |-- CaseNumber: string (nullable = true)
 |-- CreatedOn: string (nullable = true)
 |-- Direction: string (nullable = true)

Any help would be apperciated

HaiY
  • 145
  • 1
  • 5
  • 15
  • Does this answer your question? [How to flatten a struct in a Spark dataframe?](https://stackoverflow.com/questions/38753898/how-to-flatten-a-struct-in-a-spark-dataframe) – Goldengenova Jan 27 '20 at 23:28
  • I tried like this using the link val dl4=dl3.select($"CaseNumber",dl3.col("Interactions.*")) but it raises this error " No such struct field * in EmailInteractions, PhoneInteractions, WebInteractions;" which I think it is because I have arrays inside – HaiY Jan 28 '20 at 01:45

1 Answers1

0

Try this:

dl4=dl3.select([$"CaseNumber",$"Interactions.PhoneInteractions.CreatedOn",$"Interactions.PhoneInteractions.Direction"])
  • Yes, this one works good, thanks. a quick q her..as you see from the above schema the "Email Interaction" is an array but when I could not explode it, I believe "created On" and "Direction" column are also there, is there any way I could know if those columns also exist under Email interactions? – HaiY Jan 30 '20 at 18:59
  • Try this: `var explodedDf3 = dl3.select("EmailInteractions.*","*")` or add those columns back in like this: `val explodeDF2 = explodeDF.withColumn("id", explodeDF("department.id")).withColumn("name", explodeDF2("department.name"))` – Goldengenova Jan 30 '20 at 21:13
  • Thanks @Goldengenova.. I see where the issue is coming from. I have daily files that I read and now I am aware that in each daily file the schema changes on those three arrays("Email,Phone,Web interactions) for example in today's file I saw that Email interactions array does not have a structure like above where as phone interactions has the same structure which email int had on top of one. Do you know how can ignore errors in notebook databricks when exploding if it did not find the right structure? – HaiY Jan 31 '20 at 02:05