Undoing replacement with a dictionary in pandas dataframe

Question

I have a pandas dataframe like so:

x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})

For my purposes (training a ml model, I need to replace the text with numbers, so I use pd.replace() with a dictionary to change that

mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)

After that, I train the model and have it return a proposed candidate, but the model, having seen only the numbers, returns the candidate as numbers in that first column, something like this

col1	col2	col3
1	5	9

Where I'd like to get something like this

col1	col2	col3
one	5	9

I've seen this question where they create an inverted dictionary to solve the problem, and this one about getting the values of a python dictionary. But I'd like to avoid having to create another dictionary, seeing as the values of the dictionary are as unique as the keys.

I get the feeling there should be some easy way of looking up the values as if they were the keys and doing the replacement like that, but I'm not sure.

mozway · Accepted Answer · 2023-06-26T08:45:47.123

2

IF your dictionary is a bijection AND there is no initial value in COL1 that is a value from the dictionary, then the only way is to reverse the dictionary:

x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1  col2  col3
0    one     5     9
1    two     6    10
2  three     7    11
3   four     8    12

If you don't have the above mentioned conditions, then you cannot perform the replacement in a non-ambiguous way.

Example:

x = pd.DataFrame({'col1':['one','two','three','four', 4]})
#     col1
# 0    one
# 1    two
# 2  three
# 3   four
# 4      4

mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1
0    two # incorrectly mapped to "two" due to non-unique values
1    two
2  three
3   four
4   four # incorrectly mapped to "four" due to collision with the mapped value

edited Jun 26 '23 at 08:45

answered Jun 26 '23 at 08:40

mozway

194,879
13
39
75

My dictionary should be a bijection and have no initial values (I'm mapping chemical elements to their molar mass, so it's text->number) So I shall reverse the dictionary :( I wanted a better way... – David Siret Marqués Jun 26 '23 at 09:46
1

This is not a bad way, you can always build a custom python object based on a dictionary that automatically adds the reverse key upon insertion. But that's not "better" IMO ;) – mozway Jun 26 '23 at 09:48
Yeah, I liked the oneliner for the replace, I'll copy it – David Siret Marqués Jun 26 '23 at 09:49
1

Following-up on my suggestion in comment, you might want to try the bidirectional dictionary [implemented here](https://stackoverflow.com/questions/3318625/how-to-implement-an-efficient-bidirectional-hash-table): `mydict = bidict({'one': 1, 'two': 2, 'three': 3, 'four': 4}) ; x.replace({'col1': mydict}, inplace=True) ; x.replace({'col1': mydict.inverse}, inplace=True)`. Another approach would be to keep the original column with the elements and to add a new one with the molar mass ;) – mozway Jun 26 '23 at 12:01
I'ts ok, It doesn't work anyway because the ML model does not return the exact same values (damn floating point arithmetics), so in the end I'll have to look for other way. A very useful post you sent there, I'll have a deep look at it when I've got time. Thanks! – David Siret Marqués Jun 26 '23 at 12:17
Indeed, mapping floats is always a pain. Maybe consider changing the unit (e.g. using Daltons or something small enough to be able to use integers). Good luck. – mozway Jun 26 '23 at 12:19

Undoing replacement with a dictionary in pandas dataframe

1 Answers1

Linked