I have a pandas series whose unique values are something like:
['toyota', 'toyouta', 'vokswagen', 'volkswagen,' 'vw', 'volvo']
Now I want to fix some of these values like: toyouta -> toyota
(Note that not all values have mistakes such as volvo, toyota etc)
I've tried making a dictionary where key is the correct word and value is the word to be corrected and then map that onto my series.
This is how my code looks:
corrections = {'maxda': 'mazda', 'porcshce': 'porsche', 'toyota': 'toyouta', 'vokswagen': 'vw', 'volkswagen': 'vw'}
df.brands = df.brands.map(corrections)
print(df.brands.unique())
>>> [nan, 'mazda', 'porsche', 'toyouta', 'vw']
As you can see the problem is that this way, all values not present in the dictionary are automatically converted to nan
. One solution is to map all the correct values to themselves, but I was hoping there could be a better way to go about this.