pandas series containing arrays

Question

I have a pandas dataframe column which looks a little like:

Out[67]:
0      ["cheese", "milk...
1      ["yogurt", "cheese...
2      ["cheese", "cream"...
3      ["milk", "cheese"...

now, ultimately I would like this as a flat list, but in attempting to flatten this, i noticed that pandas treats ["cheese", "milk", "cream"] as str rather than list

How would i go about flattening this so I end up with:

["cheese", "milk", "yogurt", "cheese", "cheese"...]

[EDIT] So the answer given below appears to be:

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = s.str.strip("[]")
df = s.str.split(',', expand=True)
df = df.applymap(lambda x: x.replace("'", '').strip())
l = df.values.flatten()
print (l.tolist())

Which is great, question answered, answer accepted but it strikes me as rather inelegant solution.

Possible duplicate of [python pandas flatten a dataframe to a list](http://stackoverflow.com/questions/25440008/python-pandas-flatten-a-dataframe-to-a-list) — awesoon, Mar 01 '16 at 11:57
No, it is not duplicate, because `type` of column is `string` not `list` — jezrael, Mar 01 '16 at 12:34

score 2 · Accepted Answer · edited May 23 '17 at 12:07

2

You can use numpy.flatten and then flat nested lists - see:

print df
                  a
0    [cheese, milk]
1  [yogurt, cheese]
2   [cheese, cream]

print df.a.values
[[['cheese', 'milk']]
 [['yogurt', 'cheese']]
 [['cheese', 'cream']]]

l = df.a.values.flatten()
print l
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']]

print [item for sublist in l for item in sublist]
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

EDIT:

You can try:

import pandas as pd

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

#remove []
s = s.str.strip('[]')
print s
0      'cheese', 'milk'
1    'yogurt', 'cheese'
2     'cheese', 'cream'
dtype: object

df = s.str.split(',', expand=True)
#remove ' and strip empty string
df = df.applymap(lambda x: x.replace("'", '').strip())
print df
        0       1
0  cheese    milk
1  yogurt  cheese
2  cheese   cream

l = df.values.flatten()
print l.tolist()
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

edited May 23 '17 at 12:07

Community

1
1

answered Mar 01 '16 at 11:59

jezrael

822,522
95
1,334
1,252

I think there is a typo in `df.values.a.flatten()` it should instead be `df.a.values.flatten()` – shanmuga Mar 01 '16 at 12:06
this just prints each individual letter for me: `s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])` `l = s.values.flatten()` `print ([item for sublist in l for item in sublist])` – toast Mar 01 '16 at 12:12
Well, i can't deny that it works so thanks for that. I'm a little surprised though that the answer is so unwieldily – toast Mar 01 '16 at 14:50

score 0 · Answer 2 · answered Mar 01 '16 at 11:59

0

To convert the column values from str to list you could use df.columnName.tolist() and for flattening you could do df.columnName.values.flatten()

answered Mar 01 '16 at 11:59

score 0 · Answer 3 · answered Mar 01 '16 at 12:27

0

You can convert the Series into a DataFrame and then call stack:

s.apply(pd.Series).stack().tolist()

answered Mar 01 '16 at 12:27

Colin

2,087
14
16

this returns a list of strings containing ['milk', 'cheese'] `s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])` `s.apply(pd.Series).stack().tolist()` – toast Mar 01 '16 at 12:36
From the original description, I thought it was the type of the `Series` was a list of strings: `s2 = pd.Series([['cheese', 'milk'], ['yogurt', 'cheese'], ['cheese', 'cream']])`, in which case `s2.apply(pd.Series).stack().tolist()` should work. If the type of the `Series` is a string representing a list of strings, you could add an eval: `s.apply(lambda x: pd.Series(eval(x))).stack().tolist()` – Colin Mar 02 '16 at 01:27

pandas series containing arrays

3 Answers3