0

I'm trying to create a new column designating the region of a state based on the state alpha code column. I've reviewed other questions, and tried both using .apply and using np.select as shown below. Can someone please help me fix the code, and explain the concept behind what is happening behind the scenes so I can understand how to fix this issue moving forward.

Kansas_City = ['ND', 'SD', 'NE', 'KS', 'MN', 'IA', 'MO'] 
Dallas = ['TX', 'OK', 'AR', 'LA', 'TN']
conditions = [df_merge['state_alpha'] in Kansas_City, df_merge['state_alpha'] in Dallas] 
outputs = ['Kansas City', 'Dallas'] 
df_merge['Region'] = np.select(conditions, outputs, 'Other') 

The other question I was trying to follow is here - pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

state_alpha   Region
'MN'          Kansas City
'TX'          Dallas
'IA'          Kansas City
'NE'          Kansas City
Shawn Schreier
  • 780
  • 2
  • 10
  • 20

1 Answers1

1

Hope this can help you;

    df_merge['Region'] = df_merge['state_alpha'].apply(lambda x: 
    'Kansas City' if x in Kansas_City
    else 'Dallas' if x in Dallas
    else 'Others')

You can pass data to the function by using apply. This can be used for both column and row.

Please refer to https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

LaChatonnn
  • 159
  • 7
  • Thanks for your response! I'll give that a shot tomorrow morning. However, I've saw posts that indicate using the numpy select is significantly faster for large datasets than apply. Would you have any insight on how to tweak the code to work using numpy.select? – Shawn Schreier Feb 12 '20 at 02:53
  • 1
    Honestly that is new to me too, But if you want to use numpy, here is solution. In the condition variable, you have to change from `df_merge['state_alpha'] in Kansas_City` to `df_merge['state_alpha'].isin(Kansas_City)`. Because the `in` function return only True or False but, in the dataframe, it has a mix of True and False. This is why the error says ambiguous. In contrast, The `.isin(list)` function return a series of True and False. – LaChatonnn Feb 12 '20 at 03:18