3

I am trying to replace each word after . in the txt file below:

line1
line2
field: [orders.cancelled,orders.delivered,orders.reached
orders.pickup,orders.time]
some line
some line

I have a dictionary:

   d = {'cancelled':'cancelled_at', 'deliver':'xxx'}

I am running the following code. However, I am getting the results for partial match i.e

I see the new file has the following words

field: [orders.cancelled_at, orders.xxxed ..........

here from the word delivered the program is still replacing the first 7 words(deliver) and adding 'ed' in the end. I am not sure why

with open('list.txt', 'r') as g:
    text = g.read()
    for k in d:
        before = f'.{k}'
        after = f'.{d[k]}
        #print(before)
        #print(after)
        text = text.replace(before, after)
        #print(text)

with open('new_list.txt', 'w') as w:
    w.write(text)

I also tried this one and I get the same results

import re

with open('list.txt', 'r') as f:
    text = f.read()
    for k in d:
        before = f'.{k}(?!=\w)'
        print(before)
        after = f'.{d[k]}'
        print(after)
        text = re.sub(before, after, text)

with open('new_list.txt', 'w') as w:
    w.write(text)
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
trillion
  • 1,207
  • 1
  • 5
  • 15
  • you are replacing `deliver` from the word `delivered` with xxx. The result is `xxxed`. Add `"delivered": "xxx"` to your dictionary. – spyralab Sep 03 '20 at 11:43
  • 1) Use word boundaries to match whole words, 2) Escape `.` outside a character class to match a literal `.` – Wiktor Stribiżew Sep 03 '20 at 11:46
  • @spyralab I only want to remove the word after the dot if the key has a value 'deliver' and not 'delivered'. I expect the program to not change anything if the exact match is not found, in this case then it should give the new line as orders.cancelled_at, orders._delivered – trillion Sep 03 '20 at 11:50
  • f'\b{k}\b' - this should work ? @WiktorStribiżew sorry I am not that familiar with regex and would appreciate if you can explain more – trillion Sep 03 '20 at 11:54
  • I think you want https://ideone.com/EgWueE, don't you? – Wiktor Stribiżew Sep 03 '20 at 11:58
  • @WiktorStribiżew yes that's correct. Can you please explain what you did there as I am very new to the regex – trillion Sep 03 '20 at 12:04
  • @WiktorStribiżew I think already did. Also, do you think \b is really necessary here? since the match would only take place if I have the key in my dictionary so I didn't quite understand why did we have to use the \b here – trillion Sep 07 '20 at 12:02
  • 1
    `\b` word boundary is necessary to only match if we have a whole word in the string, so `short\b` will match in `short.` and not in `shorts`. – Wiktor Stribiżew Sep 07 '20 at 12:06
  • @WiktorStribiżew hey if I am reading a lot of lines (using for loop) and going through different words then what will be the replacement for the f.read() function in your code. I am not sure how will the third parameter of the print statements will look alike in this case. Addionally what is the purpose of using group() – trillion Sep 08 '20 at 13:55
  • See https://ideone.com/lKatYy. `x.group()` is the match value as string. – Wiktor Stribiżew Sep 08 '20 at 14:13
  • @WiktorStribiżew I am trying to apply multiple patterns and was hoping if there is a way that we can loop through different patterns instead of extending the code by writing each pattern in there.compile. Also somehow the program is not writing the results in the new file https://ideone.com/pJGnfm. Can you please help me with these issues ? – trillion Sep 13 '20 at 02:01
  • @WiktorStribiżew another issue that i am facing is using this pattern = fr"(?<=:)[\s]?((?!\bcompany\b).)*$(?:{'|'.join(d)})\b" #print(pattern) changes = re.sub(pattern, lambda x: d[x.group()], line) -- here when I print the pattern I do see the keys that are after the ":" but don't contain the word "company". However when I print the result i.e the variable changes the key is not getting replaced. The pattern also works fine in the text editor but is not working in python. – trillion Sep 13 '20 at 10:33
  • @HamzaShehzad Regarding your first comment, use [Python demo](https://ideone.com/6j6LsP). Regarding the second, `(?<=:)[\s]?((?!\bcompany\b).)*$(?:cancelled|deliver)\b` is a strange pattern, see [this regex demo](https://regex101.com/r/AerEbi/2). What are you trying to match with it? Do you mean you need `:\s*(?:cancelled|deliver)\b(?!.*\bcompany\b)` ([demo](https://regex101.com/r/AerEbi/3))? – Wiktor Stribiżew Sep 13 '20 at 10:55
  • @WiktorStribiżew the first problem is solved now. For the second comment https://regex101.com/r/9hZtxy/1 From here you can see that I am trying to match all the words that are after the colon ":" but the word is not equal to "company". I don't want to replace a word after a colon if it is "company". This however is not working in python but it is working in the text editor – trillion Sep 14 '20 at 17:50
  • @HamzaShehzad I'd use `:\s*\b(?!company\b)(\w+)` and then the code would refer to group 1, `d[x.group(1)]`. – Wiktor Stribiżew Sep 14 '20 at 19:43
  • @WiktorStribiżew I can also add (?<=:) in the beginning, so that it takes the word after the colon ? also how would I change my code above so that in this specific case it will pick the group(1) and for the others, it would be group 0 ---> ideone.com/pJGnfm. Also, can you find any reason as to why my pattern above was not capturing the word in python, is it also because of the group issue ? – trillion Sep 16 '20 at 07:27
  • If you mean the `fr"(?<=:)[\s]?((?!\bcompany\b).)*$(?:{'|'.join(d)})\b"` pattern, it is just malformed as `$` inside the pattern requires the immediate end of string and make it fail any string. If you need conditional replacement logic, see [an example](https://ideone.com/spdxrx). – Wiktor Stribiżew Sep 16 '20 at 08:30
  • @WiktorStribiżew I am using your pattern with for the 'company' pattern. However, can you please explain why in this case I would have to refer it to group 1 and not just the group ( ). Also my current code replaces all the words for the pattern 1 and pattern 2 but it doesn't work with pattern 3. I am not quite sure what you did with here: changes = re.sub(pattern, lambda x: f'{x.group(1)}{d[x.group(2)]}' if x.group(1) else d[x.group()], line) before we had just group ( ) .... I am not quite sure what these groups ( group no's and f ' ' do ? https://ideone.com/9Dxg4M – trillion Sep 16 '20 at 18:37
  • @WiktorStribiżew did you had a chance to take a look at the issue – trillion Sep 20 '20 at 14:39
  • No idea what the issue is. `f'...'` is an f-string, with string interpolation. Group 1 is the first set of unescaped pair of parentheses in the pattern, the `(model\s*:\s*)` is Group 1 and `({'|'.join(d)})` forms Group 2. `x.group(1)` and `x.group(2)` access these values. – Wiktor Stribiżew Sep 21 '20 at 12:34
  • @WiktorStribiżew hey, I have two issues, refer to the issues here: https://ideone.com/BOI9e8 You can see that 1) the program runs into key error which I am not sure why. 2) The second issue is that the third pattern is not working (pickup_drop is not getting replaced by pickup_drop_at). Can you please take a look ? You will find my text file and the dictionary in the link – trillion Sep 27 '20 at 14:42
  • The third alternative contains `(\w+)` that requires one or more word chars before the words you listed in a group, but you have no word chars there in the sample, you need to remove `(\w+)`, I believe. The key error is due to the fact your match starts with a colon and whitespaces, you should capture the word only part. Try [this Python code](https://ideone.com/02eaFb). – Wiktor Stribiżew Sep 27 '20 at 20:19
  • @WiktorStribiżew I used these codes (see here https://ideone.com/eUoCUn) both of them don't make the changes based on the last pattern. so it's the same issue as before – trillion Oct 05 '20 at 17:51
  • @WiktorStribiżew hey did you had a chance to take a look at it ? – trillion Oct 12 '20 at 19:49

1 Answers1

1

You can use

import re

d = {'cancelled':'cancelled_at', 'deliver':'xxx'}
rx = re.compile(fr"(?<=\.)(?:{'|'.join(d)})\b")

with open('list.txt', 'r') as f:
    print( re.sub(rx, lambda x: d[x.group()], f.read()) )

See the Python demo

The regex generated by the code looks like

(?<=\.)(?:cancelled|deliver)\b

See the regex demo. Details:

  • (?<=\.) - a positive lookbehind that matches a location immediately preceded with a literal .
  • (?:cancelled|deliver) - two alternatives: cancelled or deliver
  • \b - as whole words, \b is a word boundary.

The lambda x: d[x.group()] replacement replaces the matched word with the corresponding dictionary key value.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Hey can you explain when do we use 'f' 'fr' , I can see that you have used fr in the re.complie. Would really appreciate if you can explain the difference between those – trillion Sep 04 '20 at 15:23
  • 1
    @HamzaShehzad `r` is the raw string literal prefix used to define a string literal where the backslash is not used to form string escape sequences (please read the BONUS top sections in [Regular expression works on regex101.com, but not on prod](https://stackoverflow.com/questions/39636124)) thread. `f` is an [`f-string`](https://stackoverflow.com/questions/57150426/what-is-printf) prefix allowing to use *variable interpolation* (or variable expansion), i.e. use `{varname}` inside the string literal to actually concatenate strings you add manually with variables (instead of using `str.format`) – Wiktor Stribiżew Sep 04 '20 at 16:24