Pandas split value in rows into multiple rows based on delimiter

Question

I have a Pandas Dataframe in the below format.

[apple]
[banana]
[apple, orange]

I would like to convert this such that it has only unique values but it split by row for each value:

apple
banana
orange

score 2 · Accepted Answer · answered Jun 28 '19 at 08:57

First unnest your list to rows, then use drop_duplicates:

# Make example dataframe
df = pd.DataFrame({'Col1':[['apple'], ['banana'], ['apple', 'orange']]})

              Col1
0          [apple]
1         [banana]
2  [apple, orange]

df = explode_list(df, 'Col1').drop_duplicates()

Output

     Col1
0   apple
1  banana
2  orange

Function used from linked answer

def explode_list(df, col):
    s = df[col]
    i = np.arange(len(s)).repeat(s.str.len())
    return df.iloc[i].assign(**{col: np.concatenate(s)})

score 2 · Answer 2 · answered Jun 28 '19 at 09:00

2

You can use itertools.chain and from_iterable() to flatten list of lists and the OrderedDict to drop duplicates maintaining order:

from collections import OrderedDict
import itertools

df['col2']=OrderedDict.fromkeys(itertools.chain.from_iterable(df.col)).keys()
print(df)

               col    col2
0          [apple]   apple
1         [banana]  banana
2  [apple, orange]  orange

answered Jun 28 '19 at 09:00

anky

74,114
11
41
70

thank you for that but I am getting an error `TypeError: 'float' object is not iterable` though the column we iterating is a column of dtype `object` – scott martin Jun 28 '19 at 09:08
@scottmartin hmm. works for me for the sample. are you using this independently is are you integrating this line with some other code, you have to see why it fails. Not sure. – anky Jun 28 '19 at 09:11

Pandas split value in rows into multiple rows based on delimiter

2 Answers2