How to avoid replacing more than needed with str.replace?

Question

I have a dataframe, which shows som text in one column and the language of the text in another column. Instead of having 'en', 'ru' etc. written in the language column, I have tried to turn these abbreviations into full words:

df['text'] = df['text'] \
                  .str.replace('en', 'English') \
                  .str.replace('ru', 'Russian') \
                  .str.replace('fr', 'French') \
                  .str.replace('tr', 'Turkish') \
                  .str.replace('es', 'Spanish')

# The number of languages goes on..

The issue, however, is that it finds 'en', for example, in other words (such as French), which doesn't give the best output, when I run the dataframe:

English                   959874
Russian                   419963
FrEnglishch                93797
Turkish                    87225
Spanish                    74120
PortuguSpanisHebrew        31627

# And so on..

How can I avoid that it searches for 'en', for instance, in all words and not only, when 'en' stands alone in a column?

Simon Hawe · Answer 1 · 2022-01-23T06:32:59.260

3

You might consider using map instead of str.replace, which should be more efficient in your case as you only do dictionary lookups. Therefore, you just define a dictionary used as a lookup table that you pass to the map function. For your example, that dictionary would map short-form version to long-form. In code, that reads like

my_map = {"en": "English", "ru": "Russian", ...}
df['text'] = df.text.map(my_map)

edited Jan 23 '22 at 06:32

answered Jan 14 '22 at 11:25

Simon Hawe

3,968
6
14

Great, thanks a lot! – Nicolai MC Jan 14 '22 at 12:06

score 2 · Answer 2 · answered Jan 14 '22 at 11:26

2

You can use regex replace:

df['text'] = df['text'].str.replace(r'^en$', 'English')

In regex, ^ means start of line and $ means end of line.

So you'll basically say: Replace with English where from start of line it says en and ends there.

answered Jan 14 '22 at 11:26

neisor

384
4
15

That did the job. Thanks! – Nicolai MC Jan 14 '22 at 12:06

How to avoid replacing more than needed with str.replace?

2 Answers2