1

I have the following dataframe

Index   education   marital-status  occupation         gender    target
0       bachelors   never-married   adm-clerical       male      0
1       bachelors   spouse          exec-managerial    male      0
2       hs-grad     divorced        handlers-cleaners  male      0
3       11th        spouse          handlers-cleaners  male      0
4       bachelors   spouse          prof-specialty     female    0
5       masters     spouse          exec-managerial    female    0
6       other       other           other-service      female    0
7       hs-grad     spouse          exec-managerial    male      1
8       masters     never-married   prof-specialty     female    1
9       bachelors   spouse          exec-managerial    male      1

Can someone explain to me why the following doesn't work - I feel like it should from what I've read and what I've seen applied.

def new_features(education, gender, target):

  if [((education == 'bachelors') & (gender == 'male') & (target == 1))]:
      result = 'educated_male_convert'
  elif [((education == 'bachelors') & (gender == 'female') & (target == 1))]:
      result = 'educated_female_convert'
  else:
      result = 'educated_not_determined'
  return result

df['new_col'] = df.apply(lambda row: new_features(row['education'], row['gender'], row['target']), axis=1)

It just returns: educated_male_convert

I followed numerous tutorials and read other threads and applied the same code to my own dataset - not sure what I'm missing.

Any help would be appreciated

AdrianC
  • 383
  • 4
  • 18
  • is the function just for example? There is a better way using numpy and pandas without using a loop for such operations – anky Aug 25 '19 at 05:01
  • 1
    Well, I'd like to understand why the above doesn't work [and how you could make it work] but I'd also be equally interested in achieving the same result using a better method – AdrianC Aug 25 '19 at 05:02
  • Can you print out row before the last line? – Ninad Gaikwad Aug 25 '19 at 05:04
  • Try to determine the row. I guess there is an interation related key error. Just use a basic loop to check the value. – Mohammad Ashraful Islam Aug 25 '19 at 05:04
  • That helped - however, when it runs, it only returns `educated_male_convert` which it should only do for row 9, everything else should `educated_not_determined` – AdrianC Aug 25 '19 at 05:10

3 Answers3

4

The problem is that you put the if conditions in square brackets. So instead of testing an expression if False: ..., the code is actually testing if [False]: .... And since any non-empty list evaluates to True, [False] would be evaluated to True and the code goes to the wrong branch.

GZ0
  • 4,055
  • 1
  • 10
  • 21
  • This worked. I know what happened - previously, I had an or (|) element present which I believe requires square brackets, but I didn't remove them when I removed this element. Very helpful. Thanks! – AdrianC Aug 25 '19 at 05:26
  • 2
    As a side remark, you could use `np.where` to compute the new feature columns in a vectorized manner, which is much more efficient than using `df.apply`. See [this](https://chrisalbon.com/python/data_wrangling/pandas_create_column_using_conditional/) for a basic example. In your case a nested `np.where` call is needed. [`np.select`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html) is another alternative choice for such tasks. – GZ0 Aug 25 '19 at 05:37
  • 1
    yes and [this](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – anky Aug 25 '19 at 05:39
  • I did try this actually, but couldn't get it to work: `np.where((df['education'] == 'bachelors') & (df['gender'] == 'male') & (df['target'] == 1, 'educated_male_convert', (np.where(df['education'] == 'bachelors') & (df['gender'] == 'female') & (df['target'] == 1), 'educated_female_convert', 'educated_not_determined')))`. What did I miss? Got the following error `ValueError: setting an array element with a sequence.` – AdrianC Aug 25 '19 at 05:45
  • The parantheses do not seem to be right. The second `np.where` is applied only to `df['education'] == 'bachelors'`. It would be better to store the outcomes of those conditions as temporary variables rather than putting everything in a big expression like this. – GZ0 Aug 25 '19 at 05:54
1

This is also another way to do that :

df['new_col'] = df.apply(lambda row: 'educated_male_convert' if row['education'] == 'bachelors' and row['gender'] == 'male' and row['target'] == 1
                      else ('educated_female_convert' if row['education'] == 'bachelors' and row['gender'] == 'female' and row['target'] == 1 
                      else ('educated_not_determined')), axis=1)
df
J.K
  • 1,178
  • 10
  • 13
1

Here is a np.select solution:

c1=df.education=='bachelors' 
c2=df.gender=='male'
c3=df.target.astype(bool)
df['new_col']=np.select([c1&c2&c3,c1&~c2&c3],['educated_male_convert',
        'educated_female_convert'],'educated_not_determined')
print(df)

       education marital-status         occupation  gender  target  \
Index                                                                
0      bachelors  never-married       adm-clerical    male       0   
1      bachelors         spouse    exec-managerial    male       0   
2        hs-grad       divorced  handlers-cleaners    male       0   
3           11th         spouse  handlers-cleaners    male       0   
4      bachelors         spouse     prof-specialty  female       0   
5        masters         spouse    exec-managerial  female       0   
6          other          other      other-service  female       0   
7        hs-grad         spouse    exec-managerial    male       1   
8        masters  never-married     prof-specialty  female       1   
9      bachelors         spouse    exec-managerial    male       1   

                       new_col  
Index                           
0      educated_not_determined  
1      educated_not_determined  
2      educated_not_determined  
3      educated_not_determined  
4      educated_not_determined  
5      educated_not_determined  
6      educated_not_determined  
7      educated_not_determined  
8      educated_not_determined  
9        educated_male_convert  
anky
  • 74,114
  • 11
  • 41
  • 70