1

I have a column in my pandas DataFrame with country names. I want to apply different filters on the column using if-else conditions and have to add a new column on that DataFrame with those conditions.

Current DataFrame:-

Company Country
BV  Denmark
BV  Sweden
DC  Norway
BV  Germany
BV  France
DC  Croatia
BV  Italy
DC  Germany
BV  Austria
BV  Spain

I have tried this but in this, I have to define countries again and again.

bookings_d2.loc[(bookings_d2.Country== 'Denmark') | (bookings_d2.Country== 'Norway'), 'Country'] = bookings_d2.Country

In R I am currently using if else condition like this, I want to implement this same thing in python.

R Code Example 1 : ifelse(bookings_d2$COUNTRY_NAME %in% c('Denmark','Germany','Norway','Sweden','France','Italy','Spain','Germany','Austria','Netherlands','Croatia','Belgium'), as.character(bookings_d2$COUNTRY_NAME),'Others')

R Code Example 2 : ifelse(bookings_d2$country %in% c('Germany'), ifelse(bookings_d2$BOOKING_BRAND %in% c('BV'),'Germany_BV','Germany_DC'),bookings_d2$country)

Expected DataFrame:-

Company Country
BV  Denmark
BV  Sweden
DC  Norway
BV  Germany_BV
BV  France
DC  Croatia
BV  Italy
DC  Germany_DC
BV  Others
BV  Others
Arvind Pant
  • 65
  • 1
  • 1
  • 8
  • I have pasted the snippet, Thanks – Arvind Pant Aug 23 '19 at 04:28
  • please use the HTML snippets (icon next to the picture) option to paste the input & output. That helps us to read the data as dataframe using `pandas.read_clipboard()`. we cannot use pictures for that purpose & most us will not want to waste time creating data frames by typing in data. – moys Aug 23 '19 at 04:49
  • Hope now it will work for you. – Arvind Pant Aug 23 '19 at 05:23

3 Answers3

4

Not sure exactly what you are trying to achieve, but I guess it is something along the lines of:

df=pd.DataFrame({'country':['Sweden','Spain','China','Japan'], 'continent':[None] * 4})

  country continent
0  Sweden      None
1   Spain      None
2   China      None
3   Japan      None


df.loc[(df.country=='Sweden') | ( df.country=='Spain'), 'continent'] = "Europe"
df.loc[(df.country=='China') | ( df.country=='Japan'), 'continent'] = "Asia"

  country continent
0  Sweden    Europe
1   Spain    Europe
2   China      Asia
3   Japan      Asia

You can also use python list comprehension like:

df.continent=["Europe" if (x=="Sweden" or x=="Denmark") else "Other" for x in df.country]
visibleman
  • 3,175
  • 1
  • 14
  • 27
  • 1
    Saw you just updated the question, but the principle idea should be the same. – visibleman Aug 23 '19 at 04:52
  • I want to achieve the same, but I want to know how we can do it with if and else condition. Your answer solves my question. But I wanna know how I can do it with if-else conditional statements. – Arvind Pant Aug 23 '19 at 04:54
  • you can think of the two df.loc statements as if, and elseif, then you can think of the initial [None] value as the else ( you are free to use any other value, doesn't have to be None). – visibleman Aug 23 '19 at 04:57
  • 2
    @ArvindPant - Check dupe - `np.select` – jezrael Aug 23 '19 at 05:00
2

You can use:

For example1: Use Series.isin with numpy.where or loc, but necessary invert mask by ~:

#removed Austria, Spain
L = ['Denmark','Germany','Norway','Sweden','France','Italy',
     'Germany','Netherlands','Croatia','Belgium']

df['Country'] = np.where(df['Country'].isin(L), df['Country'], 'Others')

Alternative:

df.loc[~df['Country'].isin(L), 'Country'] ='Others'

For example2: Use numpy.select or nested np.where:

m1 = df['Country'] == 'Germany'
m2 = df['Company'] == 'BV'
df['Country'] = np.select([m1 & m2, m1 & ~m2],['Germany_BV','Germany_DC'], df['Country'])

Alternative:

df['Country'] = np.where(~m1, df['Country'],
                np.where(m2, 'Germany_BV','Germany_DC'))
print (df)
  Company     Country
0      BV     Denmark
1      BV      Sweden
2      DC      Norway
3      BV  Germany_BV
4      BV      France
5      DC     Croatia
6      BV       Italy
7      DC  Germany_DC
8      BV      Others
9      BV      Others
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

You can do to get it:

country_others=['Poland','Switzerland']


df.loc[df['Country']=='Germany','Country']=df.loc[df['Country']=='Germany'].apply(lambda x: x+df['Company'])['Country']
df.loc[(df['Company']=='DC') &(df['Country'].isin(country_others)),'Country']='Others'
ansev
  • 30,322
  • 5
  • 17
  • 31