string operation/regex - find and replace

Question

I am trying to replace each word after . in the txt file below:

line1
line2
field: [orders.cancelled,orders.delivered,orders.reached
orders.pickup,orders.time]
some line
some line

I have a dictionary:

   d = {'cancelled':'cancelled_at', 'deliver':'xxx'}

I am running the following code. However, I am getting the results for partial match i.e

I see the new file has the following words

field: [orders.cancelled_at, orders.xxxed ..........

here from the word delivered the program is still replacing the first 7 words(deliver) and adding 'ed' in the end. I am not sure why

with open('list.txt', 'r') as g:
    text = g.read()
    for k in d:
        before = f'.{k}'
        after = f'.{d[k]}
        #print(before)
        #print(after)
        text = text.replace(before, after)
        #print(text)

with open('new_list.txt', 'w') as w:
    w.write(text)

I also tried this one and I get the same results

import re

with open('list.txt', 'r') as f:
    text = f.read()
    for k in d:
        before = f'.{k}(?!=\w)'
        print(before)
        after = f'.{d[k]}'
        print(after)
        text = re.sub(before, after, text)

with open('new_list.txt', 'w') as w:
    w.write(text)

you are replacing `deliver` from the word `delivered` with xxx. The result is `xxxed`. Add `"delivered": "xxx"` to your dictionary. — spyralab, Sep 03 '20 at 11:43
1) Use word boundaries to match whole words, 2) Escape `.` outside a character class to match a literal `.` — Wiktor Stribiżew, Sep 03 '20 at 11:46
@spyralab I only want to remove the word after the dot if the key has a value 'deliver' and not 'delivered'. I expect the program to not change anything if the exact match is not found, in this case then it should give the new line as orders.cancelled_at, orders._delivered — trillion, Sep 03 '20 at 11:50
f'\b{k}\b' - this should work ? @WiktorStribiżew sorry I am not that familiar with regex and would appreciate if you can explain more — trillion, Sep 03 '20 at 11:54
@WiktorStribiżew yes that's correct. Can you please explain what you did there as I am very new to the regex — trillion, Sep 03 '20 at 12:04
@WiktorStribiżew I think already did. Also, do you think \b is really necessary here? since the match would only take place if I have the key in my dictionary so I didn't quite understand why did we have to use the \b here — trillion, Sep 07 '20 at 12:02
`\b` word boundary is necessary to only match if we have a whole word in the string, so `short\b` will match in `short.` and not in `shorts`. — Wiktor Stribiżew, Sep 07 '20 at 12:06
@WiktorStribiżew hey if I am reading a lot of lines (using for loop) and going through different words then what will be the replacement for the f.read() function in your code. I am not sure how will the third parameter of the print statements will look alike in this case. Addionally what is the purpose of using group() — trillion, Sep 08 '20 at 13:55
See https://ideone.com/lKatYy. `x.group()` is the match value as string. — Wiktor Stribiżew, Sep 08 '20 at 14:13
@WiktorStribiżew I am trying to apply multiple patterns and was hoping if there is a way that we can loop through different patterns instead of extending the code by writing each pattern in there.compile. Also somehow the program is not writing the results in the new file https://ideone.com/pJGnfm. Can you please help me with these issues ? — trillion, Sep 13 '20 at 02:01
@WiktorStribiżew another issue that i am facing is using this pattern = fr"(?<=:)[\s]?((?!\bcompany\b).)*$(?:{'|'.join(d)})\b" #print(pattern) changes = re.sub(pattern, lambda x: d[x.group()], line) -- here when I print the pattern I do see the keys that are after the ":" but don't contain the word "company". However when I print the result i.e the variable changes the key is not getting replaced. The pattern also works fine in the text editor but is not working in python. — trillion, Sep 13 '20 at 10:33
@HamzaShehzad Regarding your first comment, use [Python demo](https://ideone.com/6j6LsP). Regarding the second, `(?<=:)[\s]?((?!\bcompany\b).)*$(?:cancelled|deliver)\b` is a strange pattern, see [this regex demo](https://regex101.com/r/AerEbi/2). What are you trying to match with it? Do you mean you need `:\s*(?:cancelled|deliver)\b(?!.*\bcompany\b)` ([demo](https://regex101.com/r/AerEbi/3))? — Wiktor Stribiżew, Sep 13 '20 at 10:55
@WiktorStribiżew the first problem is solved now. For the second comment https://regex101.com/r/9hZtxy/1 From here you can see that I am trying to match all the words that are after the colon ":" but the word is not equal to "company". I don't want to replace a word after a colon if it is "company". This however is not working in python but it is working in the text editor — trillion, Sep 14 '20 at 17:50
@HamzaShehzad I'd use `:\s*\b(?!company\b)(\w+)` and then the code would refer to group 1, `d[x.group(1)]`. — Wiktor Stribiżew, Sep 14 '20 at 19:43
@WiktorStribiżew I can also add (?<=:) in the beginning, so that it takes the word after the colon ? also how would I change my code above so that in this specific case it will pick the group(1) and for the others, it would be group 0 ---> ideone.com/pJGnfm. Also, can you find any reason as to why my pattern above was not capturing the word in python, is it also because of the group issue ? — trillion, Sep 16 '20 at 07:27
If you mean the `fr"(?<=:)[\s]?((?!\bcompany\b).)*$(?:{'|'.join(d)})\b"` pattern, it is just malformed as `$` inside the pattern requires the immediate end of string and make it fail any string. If you need conditional replacement logic, see [an example](https://ideone.com/spdxrx). — Wiktor Stribiżew, Sep 16 '20 at 08:30
@WiktorStribiżew I am using your pattern with for the 'company' pattern. However, can you please explain why in this case I would have to refer it to group 1 and not just the group ( ). Also my current code replaces all the words for the pattern 1 and pattern 2 but it doesn't work with pattern 3. I am not quite sure what you did with here: changes = re.sub(pattern, lambda x: f'{x.group(1)}{d[x.group(2)]}' if x.group(1) else d[x.group()], line) before we had just group ( ) .... I am not quite sure what these groups ( group no's and f ' ' do ? https://ideone.com/9Dxg4M — trillion, Sep 16 '20 at 18:37
@WiktorStribiżew did you had a chance to take a look at the issue — trillion, Sep 20 '20 at 14:39
No idea what the issue is. `f'...'` is an f-string, with string interpolation. Group 1 is the first set of unescaped pair of parentheses in the pattern, the `(model\s*:\s*)` is Group 1 and `({'|'.join(d)})` forms Group 2. `x.group(1)` and `x.group(2)` access these values. — Wiktor Stribiżew, Sep 21 '20 at 12:34
@WiktorStribiżew hey, I have two issues, refer to the issues here: https://ideone.com/BOI9e8 You can see that 1) the program runs into key error which I am not sure why. 2) The second issue is that the third pattern is not working (pickup_drop is not getting replaced by pickup_drop_at). Can you please take a look ? You will find my text file and the dictionary in the link — trillion, Sep 27 '20 at 14:42
The third alternative contains `(\w+)` that requires one or more word chars before the words you listed in a group, but you have no word chars there in the sample, you need to remove `(\w+)`, I believe. The key error is due to the fact your match starts with a colon and whitespaces, you should capture the word only part. Try [this Python code](https://ideone.com/02eaFb). — Wiktor Stribiżew, Sep 27 '20 at 20:19
@WiktorStribiżew I used these codes (see here https://ideone.com/eUoCUn) both of them don't make the changes based on the last pattern. so it's the same issue as before — trillion, Oct 05 '20 at 17:51
@WiktorStribiżew hey did you had a chance to take a look at it ? — trillion, Oct 12 '20 at 19:49

score 1 · Accepted Answer · answered Sep 03 '20 at 13:24

1

You can use

import re

d = {'cancelled':'cancelled_at', 'deliver':'xxx'}
rx = re.compile(fr"(?<=\.)(?:{'|'.join(d)})\b")

with open('list.txt', 'r') as f:
    print( re.sub(rx, lambda x: d[x.group()], f.read()) )

See the Python demo

The regex generated by the code looks like

(?<=\.)(?:cancelled|deliver)\b

See the regex demo. Details:

(?<=\.) - a positive lookbehind that matches a location immediately preceded with a literal .
(?:cancelled|deliver) - two alternatives: cancelled or deliver
\b - as whole words, \b is a word boundary.

The lambda x: d[x.group()] replacement replaces the matched word with the corresponding dictionary key value.

answered Sep 03 '20 at 13:24

Wiktor Stribiżew

607,720
39
448
563

Hey can you explain when do we use 'f' 'fr' , I can see that you have used fr in the re.complie. Would really appreciate if you can explain the difference between those – trillion Sep 04 '20 at 15:23
1

@HamzaShehzad `r` is the raw string literal prefix used to define a string literal where the backslash is not used to form string escape sequences (please read the BONUS top sections in [Regular expression works on regex101.com, but not on prod](https://stackoverflow.com/questions/39636124)) thread. `f` is an [`f-string`](https://stackoverflow.com/questions/57150426/what-is-printf) prefix allowing to use *variable interpolation* (or variable expansion), i.e. use `{varname}` inside the string literal to actually concatenate strings you add manually with variables (instead of using `str.format`) – Wiktor Stribiżew Sep 04 '20 at 16:24

string operation/regex - find and replace

1 Answers1