0

Use pd.DataFrame.explode() method: How to unnest (explode) a column in a pandas DataFrame, into multiple rows

I have a pandas.Dataframe structure with arrays as entries which I would like to disaggregate each of the entries into a long format.

Below is the code to reproduce what I am looking for. StackOverflow is asking me to put more detain into this draft because it is mostly code, but it is mostly code because it allows the reader to reproduce the issue more clearly.

import pandas as pd
import numpy as np

date = '08-30-2022'
ids = ['s1', 's2']

g1 = ['b1', 'b2']
g2 = ['b1', 'b3', 'b4']
g_ls = [g1, g2]

v1 = [2.0, 2.5]
v2 = [3.2, np.nan, 3.7]
v_ls = [v1, v2]

dict = {
    'date': [date] * len(ids),
    'ids': ids,
    'group': g_ls,
    'values': v_ls
}

df_in = pd.DataFrame.from_dict(dict)

dict_out = {
    'date': [date] * 5,
    'ids': ['s1', 's1', 's2', 's2', 's2'],
    'group': ['b1', 'b2', 'b1', 'b3', 'b4'],
    'values': [2.0, 2.5, 3.2, np.nan, 3.7]
}
desired_df = pd.DataFrame.from_dict(dict_out)

Have:

         date ids         group           values
0  08-30-2022  s1      [b1, b2]       [2.0, 2.5]
1  08-30-2022  s2  [b1, b3, b4]  [3.2, nan, 3.7]

Want:

         date ids group  values
0  08-30-2022  s1    b1     2.0
1  08-30-2022  s1    b2     2.5
2  08-30-2022  s2    b1     3.2
3  08-30-2022  s2    b3     NaN
4  08-30-2022  s2    b4     3.7
ktj1989
  • 679
  • 2
  • 8
  • 19

1 Answers1

1

Try with

df = df_in.explode(['group','values'])
Out[173]: 
         date ids group values
0  08-30-2022  s1    b1    2.0
0  08-30-2022  s1    b2    2.5
1  08-30-2022  s2    b1    3.2
1  08-30-2022  s2    b3    NaN
1  08-30-2022  s2    b4    3.7
BENY
  • 317,841
  • 20
  • 164
  • 234