0

I have a dataframe with one column that contains a list of countries. I basically want to transform it to a new column that says "Inside US" if the row contains either United States or Puerto Rico, otherwise "Outside US". How can I do this in pandas?

Expected input:

countries
United States, Japan
China
Brazil, South Africa
Puerto Rico, Spain
United States, Vietnam
Madagascar

Expected output:

countries
Inside US
Outside US
Outside US
Inside US
Inside US
Outside US

My attempt: The following code gives me a true or false series which I'm struggling to use..Also not sure if this is the best way to start.

df['countries'].str.contains('United States|Puerto Rico')
Eisen
  • 1,697
  • 9
  • 27

2 Answers2

1

With np.where clause:

df['country_stat'] = np.where(df['countries'].str
                              .contains('United States|Puerto Rico'), 
                              'Inside US', 'Outside US')

                countries country_stat
0    United States, Japan    Inside US
1                   China   Outside US
2    Brazil, South Africa   Outside US
3      Puerto Rico, Spain    Inside US
4  United States, Vietnam    Inside US
5              Madagascar   Outside US
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
0

Here's s simple method that'll do it and check the df lines one by one and apply the method.

 def check_inside_us(country):
            if country in ['United States', 'Puerto Rico']:
                return 'Inside US'
            else:
                return 'Outside US'
        
        df1['countries'] = df1['countries'].apply(check_inside_us)
  • Answer needs supporting information Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](https://stackoverflow.com/help/how-to-answer). – moken Jul 20 '23 at 08:25