0

Consider a pandas dataframe where the column value of some rows is a list of values:

df1 = { 'Name':['Jack','John','Paul','Todd','Chris'],
       'ID':[62,47,55,74,31],
       'Subjects':['A','A',['A','B','C'],['A','B'],'C']}
df1 = pd.DataFrame(df1)

becoming

    Name  ID   Subjects
0   Jack  62          A
1   John  47          A
2   Paul  55  [A, B, C]
3   Todd  74     [A, B]
4  Chris  31          C

I need to transform those rows where df1.Subjects is a list, so that the list is exploded and distributed across row copies, such that dataframe becomes something like:

    Name  ID   Subjects
0   Jack  62          A
1   John  47          A
2   Paul  55          A
3   Todd  74          A
4  Chris  31          C
5   Paul  55          B
6   Paul  55          C
7   Todd  74          B

where the index is not so heavily important, but df1.ID should be preserved when making its row copies.

massimopinto
  • 157
  • 10
  • 1
    [DataFrame.explode](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html#pandas-dataframe-explode) -> `df1 = df1.explode('Subjects')` – Henry Ecker Jul 16 '21 at 22:18

1 Answers1

1

Use explode and merge:

>>> pd.merge(df1.drop(columns='Subjects'),
             df1['Subjects'].explode(),
             left_index=True, right_index=True, how='outer') \
             .reset_index(drop=True)

    Name  ID Subjects
0   Jack  62        A
1   John  47        A
2   Paul  55        A
3   Paul  55        B
4   Paul  55        C
5   Todd  74        A
6   Todd  74        B
7  Chris  31        C
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • this seems to address the question but a more succinct answer appears to be that of @henry-ecker with ```df(explode('column'))``` – massimopinto Jul 20 '21 at 22:12