Pandas Dataframe array entries as rows

Question

Use pd.DataFrame.explode() method: How to unnest (explode) a column in a pandas DataFrame, into multiple rows

I have a pandas.Dataframe structure with arrays as entries which I would like to disaggregate each of the entries into a long format.

Below is the code to reproduce what I am looking for. StackOverflow is asking me to put more detain into this draft because it is mostly code, but it is mostly code because it allows the reader to reproduce the issue more clearly.

import pandas as pd
import numpy as np

date = '08-30-2022'
ids = ['s1', 's2']

g1 = ['b1', 'b2']
g2 = ['b1', 'b3', 'b4']
g_ls = [g1, g2]

v1 = [2.0, 2.5]
v2 = [3.2, np.nan, 3.7]
v_ls = [v1, v2]

dict = {
    'date': [date] * len(ids),
    'ids': ids,
    'group': g_ls,
    'values': v_ls
}

df_in = pd.DataFrame.from_dict(dict)

dict_out = {
    'date': [date] * 5,
    'ids': ['s1', 's1', 's2', 's2', 's2'],
    'group': ['b1', 'b2', 'b1', 'b3', 'b4'],
    'values': [2.0, 2.5, 3.2, np.nan, 3.7]
}
desired_df = pd.DataFrame.from_dict(dict_out)

Have:

         date ids         group           values
0  08-30-2022  s1      [b1, b2]       [2.0, 2.5]
1  08-30-2022  s2  [b1, b3, b4]  [3.2, nan, 3.7]

Want:

         date ids group  values
0  08-30-2022  s1    b1     2.0
1  08-30-2022  s1    b2     2.5
2  08-30-2022  s2    b1     3.2
3  08-30-2022  s2    b3     NaN
4  08-30-2022  s2    b4     3.7

check out pandas explode method – ivanp Aug 30 '22 at 21:16 — ivanp, Aug 30 '22 at 21:16

score 1 · Answer 1 · answered Aug 30 '22 at 21:20

1

Try with

df = df_in.explode(['group','values'])
Out[173]: 
         date ids group values
0  08-30-2022  s1    b1    2.0
0  08-30-2022  s1    b2    2.5
1  08-30-2022  s2    b1    3.2
1  08-30-2022  s2    b3    NaN
1  08-30-2022  s2    b4    3.7

answered Aug 30 '22 at 21:20

BENY

317,841
20
164
234

Pandas Dataframe array entries as rows

1 Answers1