1

I have a pandas dataframe like so:

x = pd.DataFrame({'col1':['one','two','three','four'],'col2':[5,6,7,8],'col3':[9,10,11,12]})

For my purposes (training a ml model, I need to replace the text with numbers, so I use pd.replace() with a dictionary to change that

mydict = {'one': 1, 'two': 2, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)

After that, I train the model and have it return a proposed candidate, but the model, having seen only the numbers, returns the candidate as numbers in that first column, something like this

col1 col2 col3
1 5 9

Where I'd like to get something like this

col1 col2 col3
one 5 9

I've seen this question where they create an inverted dictionary to solve the problem, and this one about getting the values of a python dictionary. But I'd like to avoid having to create another dictionary, seeing as the values of the dictionary are as unique as the keys.

I get the feeling there should be some easy way of looking up the values as if they were the keys and doing the replacement like that, but I'm not sure.

1 Answers1

2

IF your dictionary is a bijection AND there is no initial value in COL1 that is a value from the dictionary, then the only way is to reverse the dictionary:

x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1  col2  col3
0    one     5     9
1    two     6    10
2  three     7    11
3   four     8    12

If you don't have the above mentioned conditions, then you cannot perform the replacement in a non-ambiguous way.

Example:

x = pd.DataFrame({'col1':['one','two','three','four', 4]})
#     col1
# 0    one
# 1    two
# 2  three
# 3   four
# 4      4

mydict = {'one': 1, 'two': 1, 'three': 3, 'four': 4}
x.replace({'col1':mydict}, inplace= True)
x.replace({'col1': {v:k for k,v in mydict.items()}}, inplace=True)

Output:

    col1
0    two # incorrectly mapped to "two" due to non-unique values
1    two
2  three
3   four
4   four # incorrectly mapped to "four" due to collision with the mapped value
mozway
  • 194,879
  • 13
  • 39
  • 75
  • My dictionary should be a bijection and have no initial values (I'm mapping chemical elements to their molar mass, so it's text->number) So I shall reverse the dictionary :( I wanted a better way... – David Siret Marqués Jun 26 '23 at 09:46
  • 1
    This is not a bad way, you can always build a custom python object based on a dictionary that automatically adds the reverse key upon insertion. But that's not "better" IMO ;) – mozway Jun 26 '23 at 09:48
  • Yeah, I liked the oneliner for the replace, I'll copy it – David Siret Marqués Jun 26 '23 at 09:49
  • 1
    Following-up on my suggestion in comment, you might want to try the bidirectional dictionary [implemented here](https://stackoverflow.com/questions/3318625/how-to-implement-an-efficient-bidirectional-hash-table): `mydict = bidict({'one': 1, 'two': 2, 'three': 3, 'four': 4}) ; x.replace({'col1': mydict}, inplace=True) ; x.replace({'col1': mydict.inverse}, inplace=True)`. Another approach would be to keep the original column with the elements and to add a new one with the molar mass ;) – mozway Jun 26 '23 at 12:01
  • I'ts ok, It doesn't work anyway because the ML model does not return the exact same values (damn floating point arithmetics), so in the end I'll have to look for other way. A very useful post you sent there, I'll have a deep look at it when I've got time. Thanks! – David Siret Marqués Jun 26 '23 at 12:17
  • Indeed, mapping floats is always a pain. Maybe consider changing the unit (e.g. using Daltons or something small enough to be able to use integers). Good luck. – mozway Jun 26 '23 at 12:19