1

I have a pandas series whose unique values are something like:

['toyota', 'toyouta', 'vokswagen', 'volkswagen,' 'vw', 'volvo']

Now I want to fix some of these values like: toyouta -> toyota

(Note that not all values have mistakes such as volvo, toyota etc)

I've tried making a dictionary where key is the correct word and value is the word to be corrected and then map that onto my series.

This is how my code looks:

corrections = {'maxda': 'mazda', 'porcshce': 'porsche', 'toyota': 'toyouta', 'vokswagen': 'vw', 'volkswagen': 'vw'}
df.brands = df.brands.map(corrections)

print(df.brands.unique())
>>> [nan, 'mazda', 'porsche', 'toyouta', 'vw']

As you can see the problem is that this way, all values not present in the dictionary are automatically converted to nan. One solution is to map all the correct values to themselves, but I was hoping there could be a better way to go about this.

Sociopath
  • 13,068
  • 19
  • 47
  • 75
Aakash Dusane
  • 388
  • 4
  • 17

1 Answers1

3

Use:

df.brands = df.brands.map(corrections).fillna(df.brands)

Or:

df.brands = df.brands.map(lambda x: corrections.get(x, x))

Or:

df.brands = df.brands.replace(corrections)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252