0

I have a data frame with following type

col1|col2|col3|col4
xxxx|yyyy|zzzz|[, 111, por-BR, 2222]

I want my output to be following type

+----+----+----+-----+
|col1|col2|col3|col4 |
+----+----+----+-----+
|  xx|  yy|  zz| 1111|
|  xx|  yy|  zz| 2222|
+----+----+----+-----+

col4 is an array and I want to appear in the same column (or different) but on one column

Following is my actual schema:

data1:pyspark.sql.dataframe.DataFrame
    col1:string
    col2:string
    col3:string
    col4:array
        element:struct
            colDept:string

I managed to do below

df = df.withColumn("col5", df["col4"].getItem(1)).withColumn("col4", df["col4"].getItem(0))
df.show()

+----+----+----+----+----+
|col1|col2|col3|col4|col5|
+----+----+----+----+----+
|  xx|  yy|  zz|1111|2222|
+----+----+----+----+----+

but I want like this if can any can help please?

#+----+----+----+-----+
#|col1|col2|col3|col4 |
#+----+----+----+-----+
#|  xx|  yy|  zz| 1111|
#|  xx|  yy|  zz| 2222|
#+----+----+----+-----+
user2841795
  • 375
  • 3
  • 10
  • 25
  • @Wen this is a pyspark question - the right dupe is: [this](https://stackoverflow.com/questions/44436856/explode-array-data-into-rows-in-spark) or [this](https://stackoverflow.com/questions/36186627/dividing-complex-rows-of-dataframe-to-simple-rows-in-pyspark) or [this](https://stackoverflow.com/questions/38210507/explode-in-pyspark/38210742) – pault Jun 26 '19 at 18:56
  • @pault feel free to add them – BENY Jun 26 '19 at 18:57

0 Answers0