0

I'm trying to populate the 'Selected' column with the values found in the 'Name' column if the conditions for 'Name' and 'Age' are true. Else, remain column as empty string. However, the program seems to not reading the if condition as it jumps to the result inside 'else'.

import pandas as pd

data = {'Name': ['Tom', 'Joseph', 'Krish', 'John'],
        'Age': ['20', '35', '43', '50'],
        'Selected': ' '}

df = pd.DataFrame(data)

df['Selected'] = df['Selected'].apply(lambda x: df['Name'] if ((df['Name']).any() == 'Tom') &
                                                    (df['Age'].any() < 25) else ' ')

print(df)

Here's the output of above code:

     Name Age Selected
0     Tom  20         
1  Joseph  35         
2   Krish  43         
3    John  50  

whereas I'm expecting to see Tom in the Selected column at the same index for the row because Tom has met the conditions for both 'Name' and 'Age'. --> Tom < 25

Any helps are appreciated, thanks!

user01
  • 37
  • 5
  • You might consider seeing what `df['Name'].any()` and `df['Age'].any()` actually produce. I think that might make it more clear what's happening – Henry Ecker Sep 04 '21 at 18:10
  • 1
    Lots of good information here -> [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/q/19913659/15497888). You're likely looking for `df['Selected'] = np.where(df['Name'].eq('Tom') & df['Age'].lt(25), df['Name'], ' ')` But you'd need to convert Age from String to int first. `df['Age'] = df['Age'].astype(int)` – Henry Ecker Sep 04 '21 at 18:11
  • @Henry Ecker, I just checked df['Name'].any() and df['Age'].any(), they returned true. I want to check if any of the values in Name and Age are Tom and at the age of < 25 – user01 Sep 04 '21 at 18:16
  • @Henry Ecker, the link is helpful, thank you! – user01 Sep 04 '21 at 18:24

2 Answers2

1

Using np.where function:

condition = (df.Name == 'Tom') & (df.Age.lt(25))
df['col'] = np.where(condition, df.Name, df.Selected)
print(df)

    Name    Age     Selected    col
0   Tom     20                  Tom
1   Joseph  35      
2   Krish   43      
3   John    50      

Using Apply method:

df.apply(lambda row: row.Name if ((row.Name == 'Tom')&
                                  (row.Age < 25)) else row.Selected, axis=1)
ashkangh
  • 1,594
  • 1
  • 6
  • 9
0

When using lambda functions with apply, you can pass the column names as variables, if I understood correctly and you are actually trying to check the conditions for each row:

df['Selected'] = df.apply(lambda x: x.Name if (x.Name == 'Tom') & (int(x.Age) < 25) else '', axis=1)

This will return:

     Name Age Selected
0     Tom  20      Tom
1  Joseph  35
2   Krish  43
3    John  50

Your current solution is using the whole columns in the conditions for each row.

Yashar
  • 762
  • 8
  • 16