Get values from a pandas dataframe, splitting up lists

Question

I have the following dataframe:

  address.state    address.town      dates 
0            MI     Dearborn         None
1            CA  Los Angeles         [2014-01-01, 2015-01-01]

How would I get a list of all values for the column, splitting up if there's ever a list value. For example:

>>> df['address.state']
['MI', 'CA'] # length of 2
>>> df['dates']
[None, '2014-01-01', '2015-01-01'] # length of 3

How would I do this in a generalized way if any of the values in the df columns has a list field?

Currently what I'm doing is:

_values = []
for _val in df.iloc[:,col_index]:
    if not isinstance(_val, list):
        _values.append(_val)
    else:
        _values.extend(_val)

>>> _values
['2014-01-01', '2015-01-01', None]

Is there a better way to do this though, perhaps directly in pandas?

there are lots of possibility in your expected answer, please cover all the possible edge cases in your question. and what is the expected output for all different cases? — Nihal, Dec 29 '18 at 07:07
you want only lists, what if the value is only integer not a list?, Why there is `None` in your output?? — Nihal, Dec 29 '18 at 07:10
@Nihal because it's a non-list value? Basically, I want a list of all values in that column...so any list should be added to that list. — , Dec 29 '18 at 07:11
i think your way is right. pandas don't have direction function for that. you have `none`, so numpy `flatten` will not work. — Nihal, Dec 29 '18 at 07:41

MrE · Answer 1 · 2018-12-30T02:23:56.353

I guess I originally misread the question:

to get unique values in each column:

df['col'].apply(pd.Series).stack().unique()

for example:

df = pd.DataFrame({'address': ['MI' , 'CA'], 'name':['John', 'Jane'], 'list': [['ab', 'cd'], 'e']})

df
Out[4]: 
  address      list  name
0      MI  [ab, cd]  John
1      CA         e  Jane

df['list'].apply(pd.Series).stack().unique()
Out[5]: array(['ab', 'cd', 'e'], dtype=object)

.apply(pd.Series) transforms the lists into single elements in multiple new columns.

stack re-stacks the multiple columns into just one.

unique() gets the unique values of that column.

gosuto · Accepted Answer · 2018-12-30T09:07:13.047

0

Use .tolist() to generate a list of the column. The only thing then is that you'd get a list in a list. To flatten the list, if you don't have multi-character strings in there:

>>> l = ['a',['b','c']]
>>> l
['a', ['b', 'c']]
>>> flat_list = [item for sublist in l for item in sublist]
>>> flat_list
['a', 'b', 'c']

(flat_list solution taken from How to make a flat list out of list of lists?)

caveat: Note however, that any iterable in the input will be flattened, including multi-character strings:

>>> l = ['one item', ['b', 'c']]
>>> [item for sublist in l for item in sublist]
['o', 'n', 'e', ' ', 'i', 't', 'e', 'm', 'b', 'c']

That's likely not what you want. Check out this discussion for a more robust way to flatten only certain types of iterables.

edited Dec 30 '18 at 09:07

answered Dec 29 '18 at 09:42

gosuto

5,422
6
36
57

1

also this solution won't work if you column names are more than 1 letter long, because if it's not a array, python will deconstruct a string into the letters of the string: if `l = ['abcd',['b','c']]` then it returns `['a', 'b', 'c', 'd', 'b', 'c']` – MrE Dec 30 '18 at 02:07
Great point MrE. The answer at https://stackoverflow.com/a/29325518/1701416 will flatten iterables except for strings. – waterproof Dec 30 '18 at 02:29

Get values from a pandas dataframe, splitting up lists

2 Answers2