0

I am trying to flatten the content of a column of a pandas.DataFrame which contains list of list however I cannot find a proper way to get a correct output.

Instead of a different question asked in StackOverflow about the same subject, here the focus is the flattering process inside each row of a pandas.DataFrame.

Here is a toy example :

df = pd.DataFrame({ 'recipe': [['olive oil',
                            'low sodium chicken broth',
                            'cilantro leaves',
                            'chile powder',
                            'fresh thyme'],
                           ['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter'],
                           ['egg',
                            'whole milk',
                            'extra-virgin olive oil',
                            'garlic cloves',
                            'corn kernels',
                            'chicken breasts']],
                   'category': ['A', 'B', 'B']
                  })
df_grouped = df.groupby('category')['recipe'].apply(lambda x: x.tolist())
df_grouped = df_grouped.reset_index()
df_grouped['recipe'][1]

This produce the following output :

[['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter'],  ['egg',    'whole milk',   'extra-virgin olive oil',  'garlic cloves',   'corn kernels',    'chicken breasts']]

My objective is to merge row by row every list of words or sentences. I tried with the following code but it split every letter.

join = lambda list_of_lists: (val for sublist in list_of_lists for val in sublist)
df_grouped['merged'] = df_grouped['recipe'].apply(lambda x: list(join(x)))

df_grouped['merged']

This produce :

0    [o, l, i, v, e,  , o, i, l, l, o, w,  , s, o, ... 

1    [c, o, c, o, n, u, t,  , m, i, l, k, f, r, o, ...

I would like the following output for each row, one array with all words

['coconut milk', 'frozen banana', 'pure acai puree', 'almond butter', 'egg',   'whole milk',   'extra-virgin olive oil',   'garlic cloves',   'corn kernels',   'chicken breasts']
Community
  • 1
  • 1
Michael
  • 2,436
  • 1
  • 36
  • 57
  • possible duplicate of [Flattening a shallow list in Python](http://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python) – Marcus Müller Sep 30 '15 at 14:05
  • This is not a duplicate, here the question is about a pandas data frame, the possible duplicate is about a single list of list. – Michael Sep 30 '15 at 17:59

2 Answers2

1

Just change the join to :

join = lambda list_of_lists: (val for sublist in list_of_lists for val in sublist if isinstance(sublist, list))

Here is the output :

In[69]: df_grouped['merged'] = df_grouped['recipe'].apply(lambda x: list(join(x)))
In[70]: df_grouped['merged']
Out[70]: 
0    [olive oil, low sodium chicken broth, cilantro...
1    [coconut milk, frozen banana, pure acai puree,...
Name: merged, dtype: object
Alex
  • 816
  • 5
  • 14
0

I had a similar situation but with integers inside of the lists instead of strings. Alex's solution was throwing a TypeError: 'int' object is not iterable exception, so I used this function instead:

def concat_lists(x):
    times = []
    try:
        for item in x:
            for time in item:
                times.append(time)
        return times
    except TypeError:
        return x

and applied it like this:

df_grouped['merged'] = df_grouped['recipe'].apply(concat_lists)