Replace a list of elements with regex

Question

I have a text full of adverbes and it's replacements like this :

adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3

And i want the adverbes to replaced in my text:

Example :

'Hello adverbe1 this is a test' to be this : 'Hello replacement1 this is a test'

but am runing out of solutions, my code so far:

adverbes = open("list_adverbes_replacement.txt", encoding="utf-8")
list_adverbes = []
list_replacement = []
for ad in adverbes.readlines():
    if ad != '' and ad.split('|')[0].strip(' ')[-3:] == 'ent':
        list_adverbes.append(ad.split('|')[0].strip(' '))
        list_replacement.append(ad.split('|')[1])
pattern = r"(\s+\b(?:{}))\b".format("|".join(list_adverbes))
data = re.sub(pattern, r"\1", data)

I couldn't find a way to replace each adverbes with the appropriate replacement.

the list_adverbes_replacement.txt is the text i gave in the beginning, and please am looking for a regex solution, i just don't know what am missing.

This is not a regular expression problem. Just split the sentence into words using `split()`, then check each word against your list, and put them back together with `join()`. — Tim Roberts, Jun 02 '21 at 20:20
Looks like this question: https://stackoverflow.com/questions/15175142/how-can-i-do-multiple-substitutions-using-regex — svrist, Jun 02 '21 at 20:25
@ThomasWeller The problem with `replace` is that it doesn't honor word boundaries. It would replace "have" in "I shaved today". — Tim Roberts, Jun 02 '21 at 20:26
@svrist excuse me but i didn't understand the code right there — , Jun 02 '21 at 20:40

oskros · Accepted Answer · 2021-06-02T20:52:03.590

1

Simple and concise approach. Build a dictionary of key/value pairs for your replacements.

Then replace them using regex' re.sub by matching on each word, looking up the word in the dictionary, and defaulting to the word itself if it's not in the dictionary

import re

d = dict()
with open('list_adverbes_replacement.txt', 'r') as fo:
    for line in fo:
        splt = line.split('|')
        d[splt[0].strip()] = splt[1].strip()

s = 'Hello adverbe1 this is a test, adverbe2'
s = re.sub(r'(\w+)', lambda m: d.get(m.group(), m.group()), s)
print(s)

edited Jun 02 '21 at 20:52

answered Jun 02 '21 at 20:40

oskros

3,101
2
9
28

1

@user16085212 Note this code will have to check each word in the input, and if the input is a long text, it might take some time to go through it. – Wiktor Stribiżew Jun 02 '21 at 21:00

ThePyGuy · Answer 2 · 2021-06-02T20:38:29.690

0

Given Adverbs like this:

adverbs =  '''adverbe1 |replacement1
adverbe2 |replacement2
adverbe3 |replacement3'''

Create a dictionary out of it where key is the adverb and value is the replacement text.

adverbsDict = {item[0].strip():item[1].strip() for item in map(lambda x: x.split('|'), adverbs.split('\n'))}

Now iterate through each keys, and just call replace on the text for the given key with the corresponding value:

text = 'Hello adverbe1 this is a test'
for key in adverbsDict:
    text = text.replace(key, adverbsDict[key])

OUTPUT:

'Hello replacement1 this is a test'

edited Jun 02 '21 at 20:38

answered Jun 02 '21 at 20:32

ThePyGuy

17,779
5
18
45

I updated the answer @user16085212 , `replace` was missing `key` parameter – ThePyGuy Jun 02 '21 at 20:38
2

This solution will end up replacing `me` in `home`. – Wiktor Stribiżew Jun 02 '21 at 20:39
@WiktorStribiżew, yes that is undeniably True – ThePyGuy Jun 02 '21 at 20:45

Wiktor Stribiżew · Answer 3 · 2021-06-02T20:56:54.097

0

You can initialize the dictionary with adverbs and replacements using

dct = {}
with open(r'__t.txt', 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

The dct will look like {'adverbe1': 'replacement1', 'adverbe2': 'replacement2', 'adverbe3': 'replacement3'}.

Then, pip install triegex (or use this solution from Speed up millions of regex replacements in Python 3) to streamline dynamic regex building and use

import triegex, re

dct = {}
with open(PATH_TO_FILE_WITH_SEARCH_AND_REPLACEMENTS, 'r') as f:
    for line in f:
        items = line.strip().split('|')
        dct[items[0].strip()] = items[1].strip()

test = 'Hello adverbe1 this is a test'
pattern = re.compile(fr'\b{triegex.Triegex(*dct.keys()).to_regex()}')
print( pattern.sub(lambda x: dct[x.group()], test) )
# => Hello replacement1 this is a test

The pattern for this demo dictionary is \b(?:adverbe(?:1\b|2\b|3\b)|~^(?#match nothing)), and it matches adverbe1, adverbe2, adverbe3 as whole words.

The lambda x: dct[x.group()], the replacement argument to re.sub, gets the corresponding replacement value.

edited Jun 02 '21 at 20:56

answered Jun 02 '21 at 20:35

Wiktor Stribiżew

607,720
39
448
563

it gives me an error in the line : for line in f – Jun 02 '21 at 20:48
i did solve it, but i don't know why the code isn't working!! it doesn't replace anything and yet the dict if full and contanis everything – Jun 02 '21 at 20:54
@user16085212 Use the real file path instead of `r'__t.txt'`, and it does replace, since I just copied the output from my Python console. – Wiktor Stribiżew Jun 02 '21 at 20:54
yes wiktor i did replace it with the correct name but didn't work. – Jun 02 '21 at 20:57
@user16085212 This just means you did not actually use my code since it is working fine. Just re-tried. – Wiktor Stribiżew Jun 02 '21 at 21:02
i did use your code @Wiktor it just didn't work, it's working fine but doesn't replace the adverbe as it should be – Jun 02 '21 at 21:18

Replace a list of elements with regex

3 Answers3