0

df=spark.sql("select key, name, subjects from table")

df in from above select statement :

key name    subjects
12  x,y,z   1,2,3
20  a,b     8,7

df out :

12  x 1
12  y 2
12  z 3
20  a 8
20  b 7

tried converting to list , explode. Still throwing error. pls help the efficient way to achieve this ?

blackbishop
  • 30,945
  • 11
  • 55
  • 76
codek
  • 65
  • 1
  • 6
  • Related to [this question](https://stackoverflow.com/questions/53218931/how-to-unnest-explode-a-column-in-a-pandas-dataframe). – Quang Hoang Feb 04 '21 at 04:42

2 Answers2

2

One way using pandas.DataFrame.apply:

# df["name"] = df["name"].str.split(",")
# df["subjects"] = df["subjects"].str.split(",")
# If not already split

new_df = df.apply(pd.Series.explode)
print(new_df)

Output:

   key name subjects
0   12    x        1
0   12    y        2
0   12    z        3
1   20    a        8
1   20    b        7
Chris
  • 29,127
  • 3
  • 28
  • 51
0

Thanks chris. It is getting exploded. Still facing the error - Cannot reindex from a duplicate axis. Concat with ignore index is not working .Is it possible to generate temp unique indexes as key is duplicated during explode. pandasversion -1.0.5

df["name"] = df["name"].str.split(",") 
df["subjects"] = df["subjects"].str.split(",") 
new_df= df.apply(pd.Series.explode).reindex() 
codek
  • 65
  • 1
  • 6