1

I have a dataframe like below.

Note: There will be only a max 3 or 4 repetions in the Array type)

root
 |-- A: string (nullable = true)
 |-- B: string (nullable = true)
 |-- ArrayIDs: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- ArrayID: string (nullable = true)
 |    |    |-- ArrayIDType: string (nullable = true)
 |    |    |-- ArrayIDlength: string (nullable = true)
 |-- C: string (nullable = true)

with Values like

 +----------+----+--------------------+----------------+
|A|B|          ArrayIDs|         C
+----------+----+--------------------+----------------+
|   3.54|null|    [[R,C,3],[X,Y,7]]|111|
|   3.64|null| [[3,P,12], [0,P,3],[I,X,5]]|222|
|   4.64|   Y|    [[L,B,1]]|333|

I'm expecting output something like

root
 |-- A: string (nullable = true)
 |-- B: string (nullable = true)
 |-- ArrayID1: string
 |-- ArrayIDType1: string 
 |-- ArrayIDlength1: string 
 |-- ArrayID2: string
 |-- ArrayIDType2: string 
 |-- ArrayIDlength2: string 
 |-- ArrayID3: string
 |-- ArrayIDType3: string 
 |-- ArrayIDlength3: string 
 |-- C: string (nullable = true)

I tried with explode but that is not what i expect, Any suggestions ?

It is not a duplicate of the one mentioned in comment, But more about adding new columns from the array in dataframe

KK2486
  • 353
  • 2
  • 3
  • 13
  • I dont think so in the other i can see only how we can access a dataframe with complext types which is not the case here. – KK2486 Nov 09 '17 at 12:27
  • explode is the best solution for this kind of issues. you have to explode those arrays and group by other columns. – Sahil Desai Nov 09 '17 at 12:34

0 Answers0