-1

I have imported an excel to a pandas dataframe, which I'm trying to translate and then export back to an excel.

For example purpose say this is my data set:

d = {"cool":"chill", "guy":"dude","cool guy":"bro"}```
data = [['cool guy'], ['cool'], ['guy']]
df = pd.DataFrame(data, columns = ['WORDS'])


print(df)
#    WORDS   
# 0  cool guy   
# 1  cool  
# 2  guy    

So the easiest solution would be to use pandas built in function replace. However if you use:

df['WORDS'] = df['WORDS'].replace(d, regex=True) 

The result is:

print(df)
#    WORDS   
# 0  chill dude   
# 1  chill  
# 2  dude 

(cool guy doesn't get translated correctly)

This could be solved by sorting the dictionary by the longest word first. I tried to use this function:

import re
def replace_words(col, dictionary):
    # sort keys by length, in reverse order
    for item in sorted(dictionary.keys(), key = len, reverse = True):
        col = re.sub(item, dictionary[item], col)
    return col

But..

df['WORDS'] = replace_words(df['WORDS'], d) 

Results in a type error: TypeError: expected string or bytes-like object

Trying to convert the row to a string did not help either

...*
col = re.sub(item, dictionary[item], [str(row) for row in col])

Does anyone have any solution or different approach I could try?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

3 Answers3

1

Let us try replace

df.WORDS.replace(d)
Out[307]: 
0      bro
1    chill
2     dude
Name: WORDS, dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You can use Series.map and pass the dictionary. It will replace the values in the column from the dictionary if it is found in the key of the dictionary being passed.

>>> df['WORDS'].map(d)
0      bro
1    chill
2     dude
Name: WORDS, dtype: object
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
0
df['WORDS'] = df['WORDS'].apply(lambda x: d[x])

This will do the work.