How to split dataframes with multiple categories using str.contains in python pandas?

Question

I have a dataframe like this,

id   col1
1    apple, peach
2    apple, banana
3    melon, peach
4    berry, apple, peach
5    melon, banana

This table has 5 categories in col1.

I know how to select each category using str.contains().

df_apple = df[df['col1'].str.contains("apple")]
df_peach = df[df['col1'].str.contains("peach")]
df_melon = df[df['col1'].str.contains("melon")]
df_berry = df[df['col1'].str.contains("berry")]
df_banana = df[df['col1'].str.contains("banana")]

How can I generate 5 dataframes in one time using some pandas function? So my outputs are 5 dataframes named df_apple, df_peach, df_melon, df_berry, df_banana.

And saved them into 5 different csv files.

piRSquared · Accepted Answer · 2019-10-30T20:52:55.097

1

I'd explode the column an find unique id

d = df.set_index('id').col1
e = d.str.split(', ').explode()

r = {k: d.loc[v] for k, v in e.index.groupby(e).items()}

r['apple']

id
1           apple, peach
2          apple, banana
4    berry, apple, peach
Name: col1, dtype: object

Or to dump to csv

d = df.set_index('id').col1
e = d.str.split(', ').explode()

for k, v in e.index.groupby(e).items():
    d.loc[v].to_frame().to_csv(f"{k}.csv")

Then

pd.read_csv('apple.csv')

   id                 col1
0   1         apple, peach
1   2        apple, banana
2   4  berry, apple, peach

For Pandas versions < 0.25

def explode(s):
    return pd.Series(np.concatenate(s.to_numpy()), s.index.repeat(s.str.len()))

d = df.set_index('id').col1
e = d.str.split(', ').pipe(explode)

And see this post by @MaxU

edited Oct 30 '19 at 20:52

answered Oct 30 '19 at 20:08

piRSquared

285,575
57
475
624

It gave me an error.`AttributeError: 'Series' object has no attribute 'explode'`. So confused I checked the doc that `explode` should be a method applied on Series. Does it because I have multiple columns rather than my sample dataframe with only 2 columns? – Jiayu Zhang Oct 30 '19 at 20:33
you are using an older version of pandas. `explode` came out in version 0.25 – piRSquared Oct 30 '19 at 20:37

Benoit de Menthière · Answer 2 · 2019-10-30T19:43:32.090

0

I recommand you to store them in a dict :

dfdict = {fruit:df[df['col1'].str.contains(fruit)] for fruit in ['apple', 'peach', 'melon', 'berry', 'banana']}

for k,v in dfdict.items():
    v.to_csv('df'+k+'.csv')

edited Oct 30 '19 at 19:43

answered Oct 30 '19 at 19:20

Benoit de Menthière

713
4
13

Thanks for your help, Benoit. So how can I save them into multiple csv files? I know these 'key' should be iterated. – Jiayu Zhang Oct 30 '19 at 19:40
I added it to my answer – Benoit de Menthière Oct 30 '19 at 19:43

How to split dataframes with multiple categories using str.contains in python pandas?

2 Answers2