I have a quite large dataset (+150M rows) and I want to replace some values in a column with corresponding dictionary keys.
What I have looks like this:
col1
John
John Marvin
Lucas
Name:Lucas
Mary
Mary Surname
And I am trying to make it look like this
col1
John
John
Lucas
Lucas
Mary
Mary
The number of different values in col1
is not that large so I thought in creating a dictionary assigning the correct values for all those that are odd.
d = {'John Marvin' = 'John', 'Name:Lucas' = 'Lucas', 'Mary Surname' = 'Mary'}
Given the lenght of my dataset I am trying to find a fast way to do this, does anyone have an idea of what could be a good way to do it?
Thanks!