0

Have a dataframe as follows:

df = pd.DataFrame({'Col1': ('x1', 'x2', 'x3'),
              'Col2': ('y1', 'y2', 'y3'),
              'Sets': ({1, 2}, {2, 3, 4}, {5})})

...

    Col1    Col2    Sets
0   x1      y1     {1, 2}
1   x2      y2     {2, 3, 4}
2   x3      y3     {5}

How can the Sets column be melted to get the following output:

pd.DataFrame({'Col1': ('x1', 'x1', 'x2', 'x2', 'x2', 'x3'),
              'Col2': ('y1', 'y1', 'y2', 'y2', 'y2', 'y3'),
              'Col3': (1, 2, 2, 3, 4, 5)})

...

    Col1    Col2    Col3
0   x1      y1      1
1   x1      y1      2
2   x2      y2      2
3   x2      y2      3
4   x2      y2      4
5   x3      y3      5

...

Tried to do the following...

df = df.assign(lens=[len(r) for r in df.Sets]) # get length of each set
df.lens = df.lens.astype('int32') # repeat takes only int32!
df = df.reindex(df.index.repeat(df.lens)) # repeats the rows
df = df.assign(Lists=[list(r) for r in df.Sets]) # converts sets to list

df = df.assign(Loc=df.groupby(df.index).cumcount()) # the list index

...but don't know how to extract the value of the List using Loc ...

Help is welcome.

reservoirinvest
  • 1,463
  • 2
  • 16
  • 32

1 Answers1

1

Create Series by duplicated index, so possible use DataFrame.join for repeating rows:

s = (pd.DataFrame(df.pop('Sets').values.tolist(), index=df.index)
        .stack()
        .rename('Sets')
        .reset_index(level=1, drop=True))

df = df.join(s).reset_index(drop=True)
print (df)

  Col1 Col2  Sets
0   x1   y1   1.0
1   x1   y1   2.0
2   x2   y2   2.0
3   x2   y2   3.0
4   x2   y2   4.0
5   x3   y3   5.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252