0

I am trying to create a new column based on another column. specifically if it contains a certain value.

I have done the following:

df['region'] = np.where(df['location'].str.contains("AK| AZ | CA | CO | HI |ID | MT | NM | NV | OR | UT | WA | WY", na=False), "west",
                     np.where(df['location'].str.contains("PA | NJ | NY | VT | NH | MA | RI | CT | ME", na=False), "northwest",
                     np.where(df['location'].str.contains("AR | AL | DC | DE | FL | GA | KY | LA | MD | MS | NC | OK | SC | VA | WV", na=False), "south",
                     np.where(df['location'].str.contains("IA | IL | IN | KS |MI | MN |MO | ND |NE | OH | SD | WI", na=False), "midwest", "international"))))

I am getting this:

 location        region

Columbia, MO    international
Maplewood, NJ   international

expected:

 location        region

Columbia, MO    midwest
Maplewood, NJ   northwest

I basically have a column 'location', I want to check if it contains one of the abbreviations and then create a new column for the region.

Thank you!

  • What do you want to assign to your `region` column if it does not contain one of those strings? – Erfan Jun 23 '19 at 12:44
  • thanks you for replaying. I would assign it "international". I basically have the same code for the rest of the states. others would be International – Alaa Senjab Jun 23 '19 at 12:46
  • Use `df['region'] = np.where([df['location'].str.contains("IA | IL | IN | KS |MI | MN | MO | ND | NE | OH | SD | WI", na=False), 'midwest', 'international')`. – Erfan Jun 23 '19 at 12:50
  • Find more information about creating _conditional columns_ [here](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – Erfan Jun 23 '19 at 12:51
  • thank you so much for the help! I am now not getting the expected results. I hope it gets reopened – Alaa Senjab Jun 23 '19 at 13:12

0 Answers0