-1

In the adult_quasiIdentifer data frame, there is a column called nativeCountry. I am trying to modify the nativeCountry so that it contains the continent, instead of the country name.

adult_quasiIdentifer dataset

Here is my code:

North_America = ['United-States', 'Cuba', 'Jamaic', 'Mexico', 'Puerto-Rico', 'Honduras','Canada','Haiti', 'Dominican-Republic', 'El-Salvador', 'Guatemala','Nicaragua' ]
South_America = ['Columbia','Ecuador', 'Peru','Trinadad&Tobago']
Asia = ['India', 'Iran','Philippines', 'Cambodia', 'Thailand','Laos', 'Taiwan', 'China', 'Japan', 'Vietnam','Hong']
Europe = ['England','Germany', 'Italy','Poland', 'Portugal', 'France', 'Yugoslavia','Scotland', 'Greece', 'Ireland', 'Hungary','Holand-Netherlands']
contient = {'North_America': North_America,'South_America': South_America, 'Asia': Asia, 'Europe': Europe}
for key, val in contient.items():
    adult_quasiIdentifier.loc[adult_quasiIdentifier.nativeCountry.isin(val),"nativeCountry"] = key

adult_quasiIdentifier

The quasiIdentifier data set did not get modified, and I also get the message saying that

message

I don't know what's wrong with my code. Is there any way I can modify the nativeCountry column? Thanks!

tlentali
  • 3,407
  • 2
  • 14
  • 21
  • You can use [Series.map()](https://pandas.pydata.org/docs/reference/api/pandas.Series.map.html) to map values from a dictionary into a seires – G. Anderson Oct 01 '21 at 22:20

1 Answers1

0

Easiest if you can create dictionary of country - continent. In which case, you can do this:

import pandas as pd

#dummy data creating
df = pd.DataFrame({
  'country': ['Cuba', 'Peru', 'India', 'Taiwan', 'England', 'Germany']
})

#country-continent dictionary
country_continent = {
  'Cuba': 'South_America',
  'Peru': 'South_America',
  'India': 'Asia',
  'Taiwan': 'Asia',
  'England': 'Europe',
  'Germany': 'Europe'
}

# then replace/change
df['country'] = df['country'].map(country_continent)

So, from this dataframe:

enter image description here

To:

enter image description here

And, while we're at it, might be good to rename the column:

df.rename(columns={'country': 'continent'}, inplace=True)
garagnoth
  • 214
  • 2
  • 10
  • Hey! Thanks for your suggestion! but the column nativeCountry in the data set has the type "Object". I tried to use the map(), but since in the dictionary, the key is a string it was not able to match up. – Jiayi Li Oct 02 '21 at 04:10
  • How about casting the type as string? `df['nativeCountry'] = df['nativeCountry'].astype('string')` – garagnoth Oct 02 '21 at 04:20
  • Btw, pandas treats string series as object. Further info here: https://stackoverflow.com/questions/21018654/strings-in-a-dataframe-but-dtype-is-object – garagnoth Oct 02 '21 at 04:24
  • No It didn't work. I also tried df['nativeCountry'] = np.where((df.nativeCountry == 'United-States'),'North_America',df.nativeCountry) and many other similar query. It still not working. – Jiayi Li Oct 02 '21 at 04:25