I am trying to flatten the content of a column of a pandas.DataFrame
which contains list of list however I cannot find a proper way to get a correct output.
Instead of a different question asked in StackOverflow about the same subject, here the focus is the flattering process inside each row of a pandas.DataFrame
.
Here is a toy example :
df = pd.DataFrame({ 'recipe': [['olive oil',
'low sodium chicken broth',
'cilantro leaves',
'chile powder',
'fresh thyme'],
['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter'],
['egg',
'whole milk',
'extra-virgin olive oil',
'garlic cloves',
'corn kernels',
'chicken breasts']],
'category': ['A', 'B', 'B']
})
df_grouped = df.groupby('category')['recipe'].apply(lambda x: x.tolist())
df_grouped = df_grouped.reset_index()
df_grouped['recipe'][1]
This produce the following output :
[['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter'], ['egg', 'whole milk', 'extra-virgin olive oil', 'garlic cloves', 'corn kernels', 'chicken breasts']]
My objective is to merge row by row every list of words or sentences. I tried with the following code but it split every letter.
join = lambda list_of_lists: (val for sublist in list_of_lists for val in sublist)
df_grouped['merged'] = df_grouped['recipe'].apply(lambda x: list(join(x)))
df_grouped['merged']
This produce :
0 [o, l, i, v, e, , o, i, l, l, o, w, , s, o, ...
1 [c, o, c, o, n, u, t, , m, i, l, k, f, r, o, ...
I would like the following output for each row, one array with all words
['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter', 'egg', 'whole milk', 'extra-virgin olive oil', 'garlic cloves', 'corn kernels', 'chicken breasts']