5

I have a dataframe like this:

df1 = pd.DataFrame({'col1' : ['cat', 'cat', 'dog', 'green', 'blue']})

and I want a new column that gives the category, like this:

dfoutput = pd.DataFrame({'col1' : ['cat', 'cat', 'dog', 'green', 'blue'],
                         'col2' : ['animal', 'animal', 'animal', 'color', 'color']})

I know I could do it inefficiently using .loc:

df1.loc[df1['col1'] == 'cat','col2'] = 'animal'
df1.loc[df1['col1'] == 'dog','col2'] = 'animal'

How do I combine cat and dog to both be animal? This doesn't work:

df1.loc[df1['col1'] == 'cat' | df1['col1'] == 'dog','col2'] = 'animal'
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Liquidity
  • 625
  • 6
  • 24

3 Answers3

6

Build your dict then do map

d={'dog':'ani','cat':'ani','green':'color','blue':'color'}
df1['col2']=df1.col1.map(d)
df1
    col1   col2
0    cat    ani
1    cat    ani
2    dog    ani
3  green  color
4   blue  color
BENY
  • 317,841
  • 20
  • 164
  • 234
3

Since multiple items may belong to a single category I suggest you start with a dictionary mapping category to items:

cat_item = {'animal': ['cat', 'dog'], 'color': ['green', 'blue']}

You'll probably find this easier to maintain. Then reverse your dictionary using a dictionary comprehension, followed by pd.Series.map:

item_cat = {w: k for k, v in cat_item.items() for w in v}

df1['col2'] = df1['col1'].map(item_cat)

print(df1)

    col1    col2
0    cat  animal
1    cat  animal
2    dog  animal
3  green   color
4   blue   color

You can also use pd.Series.replace, but this will be generally less efficient.

jpp
  • 159,742
  • 34
  • 281
  • 339
0

you could also try using np.select like this:

options = [(df1.col1.str.contains('cat|dog')), 
           (df1.col1.str.contains('green|blue'))]

settings = ['animal', 'color']

df1['setting'] = np.select(options,settings)

I've found this works quite fast even with very big dataframes

Mara
  • 1