Use pd.DataFrame.explode() method: How to unnest (explode) a column in a pandas DataFrame, into multiple rows
I have a pandas.Dataframe structure with arrays as entries which I would like to disaggregate each of the entries into a long format.
Below is the code to reproduce what I am looking for. StackOverflow is asking me to put more detain into this draft because it is mostly code, but it is mostly code because it allows the reader to reproduce the issue more clearly.
import pandas as pd
import numpy as np
date = '08-30-2022'
ids = ['s1', 's2']
g1 = ['b1', 'b2']
g2 = ['b1', 'b3', 'b4']
g_ls = [g1, g2]
v1 = [2.0, 2.5]
v2 = [3.2, np.nan, 3.7]
v_ls = [v1, v2]
dict = {
'date': [date] * len(ids),
'ids': ids,
'group': g_ls,
'values': v_ls
}
df_in = pd.DataFrame.from_dict(dict)
dict_out = {
'date': [date] * 5,
'ids': ['s1', 's1', 's2', 's2', 's2'],
'group': ['b1', 'b2', 'b1', 'b3', 'b4'],
'values': [2.0, 2.5, 3.2, np.nan, 3.7]
}
desired_df = pd.DataFrame.from_dict(dict_out)
Have:
date ids group values
0 08-30-2022 s1 [b1, b2] [2.0, 2.5]
1 08-30-2022 s2 [b1, b3, b4] [3.2, nan, 3.7]
Want:
date ids group values
0 08-30-2022 s1 b1 2.0
1 08-30-2022 s1 b2 2.5
2 08-30-2022 s2 b1 3.2
3 08-30-2022 s2 b3 NaN
4 08-30-2022 s2 b4 3.7