I have imported an excel to a pandas dataframe, which I'm trying to translate and then export back to an excel.
For example purpose say this is my data set:
d = {"cool":"chill", "guy":"dude","cool guy":"bro"}```
data = [['cool guy'], ['cool'], ['guy']]
df = pd.DataFrame(data, columns = ['WORDS'])
print(df)
# WORDS
# 0 cool guy
# 1 cool
# 2 guy
So the easiest solution would be to use pandas built in function replace. However if you use:
df['WORDS'] = df['WORDS'].replace(d, regex=True)
The result is:
print(df)
# WORDS
# 0 chill dude
# 1 chill
# 2 dude
(cool guy doesn't get translated correctly)
This could be solved by sorting the dictionary by the longest word first. I tried to use this function:
import re
def replace_words(col, dictionary):
# sort keys by length, in reverse order
for item in sorted(dictionary.keys(), key = len, reverse = True):
col = re.sub(item, dictionary[item], col)
return col
But..
df['WORDS'] = replace_words(df['WORDS'], d)
Results in a type error:
TypeError: expected string or bytes-like object
Trying to convert the row to a string did not help either
...*
col = re.sub(item, dictionary[item], [str(row) for row in col])
Does anyone have any solution or different approach I could try?