use numpy to add a column based on conditions

Question

I have trying to add a column D to below df and add condition like this: if column C is in Shanghai, then D is Asia, if column C is SFA, then D is America...take these two as an example, my code as following:

  A     B   C      
0 Joe   23   SFA
1 Amy   40   SFA
2 Jenny 34   SFA
3 Kitty  20  Shanghai
4 David  19  Shanghai
...

code:

df['D'] = np.where(
    df['C'] == 'SFA','America',
    np.where(df['C'] =='Shnaghai','Asia','Other'
    )
)

But it keeps giving an error showing: KeyError:'C' I have no idea why it give me this error always as I am pretty sure the data frame is pandas and the column C is being converted to string. Can anyone provide me any insights?

It looks like you have the code for conditional creation column correct, but the column name has some spaces in it. You can verify that is the issue by printing out `df.columns.tolist()`. You can fix the issue using `df.columns = df.columns.str.strip()`. — cs95, Nov 02 '20 at 03:08
it gives me the same error after I doing these. When I apply the first formula, it doesn't return column C, but another one that I don't need. — Jennie, Nov 02 '20 at 03:22
What does `print(df.columns.tolist())` return, please copy-paste it here. — cs95, Nov 02 '20 at 03:24
it returns another column ['Amount'] which I didn't post in this dataframe, I don't need this column for now. — Jennie, Nov 02 '20 at 03:25
So it returns [A, B, C, Amount]? Can you paste the output here? — cs95, Nov 02 '20 at 03:27
It doesn't return any dataframe, just the column name: ['Amount']. Nothing else. — Jennie, Nov 02 '20 at 03:28
So you are trying to access column names that don't exist, don't you see the issue there? — cs95, Nov 02 '20 at 03:29
If these are in fact levels in the index and not actual column names, you should reset the index first: `df = df.reset_index()` — cs95, Nov 02 '20 at 03:30
I am pretty sure all the columns are in the dataframe, they appear the exact names I put in the code and they are not in the index level. — Jennie, Nov 02 '20 at 03:44
In that case it should be reflected in the output of `print(df.columns.tolist())` At this point we are just talking in circles, so without more context I'm afraid your issue is not reproducible for anyone here, sorry. — cs95, Nov 02 '20 at 03:48

use numpy to add a column based on conditions

0 Answers0