2

I have a pandas dataframe column which looks a little like:

Out[67]:
0      ["cheese", "milk...
1      ["yogurt", "cheese...
2      ["cheese", "cream"...
3      ["milk", "cheese"...

now, ultimately I would like this as a flat list, but in attempting to flatten this, i noticed that pandas treats ["cheese", "milk", "cream"] as str rather than list

How would i go about flattening this so I end up with:

["cheese", "milk", "yogurt", "cheese", "cheese"...]

[EDIT] So the answer given below appears to be:

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = s.str.strip("[]")
df = s.str.split(',', expand=True)
df = df.applymap(lambda x: x.replace("'", '').strip())
l = df.values.flatten()
print (l.tolist())

Which is great, question answered, answer accepted but it strikes me as rather inelegant solution.

toast
  • 582
  • 1
  • 6
  • 20
  • Possible duplicate of [python pandas flatten a dataframe to a list](http://stackoverflow.com/questions/25440008/python-pandas-flatten-a-dataframe-to-a-list) – awesoon Mar 01 '16 at 11:57
  • 1
    No, it is not duplicate, because `type` of column is `string` not `list` – jezrael Mar 01 '16 at 12:34

3 Answers3

2

You can use numpy.flatten and then flat nested lists - see:

print df
                  a
0    [cheese, milk]
1  [yogurt, cheese]
2   [cheese, cream]

print df.a.values
[[['cheese', 'milk']]
 [['yogurt', 'cheese']]
 [['cheese', 'cream']]]

l = df.a.values.flatten()
print l
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']]

print [item for sublist in l for item in sublist]
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

EDIT:

You can try:

import pandas as pd

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

#remove []
s = s.str.strip('[]')
print s
0      'cheese', 'milk'
1    'yogurt', 'cheese'
2     'cheese', 'cream'
dtype: object

df = s.str.split(',', expand=True)
#remove ' and strip empty string
df = df.applymap(lambda x: x.replace("'", '').strip())
print df
        0       1
0  cheese    milk
1  yogurt  cheese
2  cheese   cream

l = df.values.flatten()
print l.tolist()
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I think there is a typo in `df.values.a.flatten()` it should instead be `df.a.values.flatten()` – shanmuga Mar 01 '16 at 12:06
  • this just prints each individual letter for me: `s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])` `l = s.values.flatten()` `print ([item for sublist in l for item in sublist])` – toast Mar 01 '16 at 12:12
  • Well, i can't deny that it works so thanks for that. I'm a little surprised though that the answer is so unwieldily – toast Mar 01 '16 at 14:50
0

To convert the column values from str to list you could use df.columnName.tolist() and for flattening you could do df.columnName.values.flatten()

0

You can convert the Series into a DataFrame and then call stack:

s.apply(pd.Series).stack().tolist()
Colin
  • 2,087
  • 14
  • 16
  • this returns a list of strings containing ['milk', 'cheese'] `s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])` `s.apply(pd.Series).stack().tolist()` – toast Mar 01 '16 at 12:36
  • From the original description, I thought it was the type of the `Series` was a list of strings: `s2 = pd.Series([['cheese', 'milk'], ['yogurt', 'cheese'], ['cheese', 'cream']])`, in which case `s2.apply(pd.Series).stack().tolist()` should work. If the type of the `Series` is a string representing a list of strings, you could add an eval: `s.apply(lambda x: pd.Series(eval(x))).stack().tolist()` – Colin Mar 02 '16 at 01:27