1

I'm using RegEx in Python to search through a text file for occurrences of names in a roster, and then append a "!" character to the start of the string. For example:

roster = ["name1," "name2," "name3"]

Original String = "name1 went home."

Output String - "!name1 went home."

I found this thread on how to append to the end of the string, which I used successfully for that purpose. I've tinkered with RegEx to append at the start of the string, but with no success. My attempt is below - any recommendations?

with open("File.txt", 'r+') as f:
   s = f.read()
   new_s = re.sub(r'^(.*{}.*)^'.format(re.escape("name1")), lambda g: g.group(0) + "!", s, flags=re.MULTILINE)
   f.seek(0)
   f.write(new_s)
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • 1
    Did you mean `$` for end of line? or `\Z` for end of string? – The fourth bird May 31 '21 at 19:54
  • 2
    What exactly is the problem with the attempt you have shown? Have you tried to use `'!' + ...` instead of `... + '!'`? – mkrieger1 May 31 '21 at 19:55
  • It was $ for end of line in the original code - I switched it to ^ to try and get beginning of line. – Daniel Hutchinson May 31 '21 at 19:55
  • In using the original RegEx, the character is successfully appended to the end of the string. In my altered Regex, nothing appears to be appended anywhere, either at the beginning or end. – Daniel Hutchinson May 31 '21 at 19:57
  • 1
    Note you may read line by line and it will make the code simpler. Also, you need no lambda here, `\g<0>` is a backreference to the whole match. Or, you can use `\1` since you wrapped the whole pattern with a capturing group. Main thing is that `^` with `re.M` matches right after an LF char, and `.` does not match LF chars by default. But using `re.S` / `re.DOTALL` here is not a good idea. – Wiktor Stribiżew May 31 '21 at 20:00
  • 2
    The word is "prepend". "Append" is specific to the end, "prepend" is for putting something at the beginning. – Charles Duffy May 31 '21 at 20:04
  • Anyhow, why do you have the `.*`s in your regex in the first place? – Charles Duffy May 31 '21 at 20:07
  • 1
    Still, is there really a space between `name` and number in the input, and no space in the search phrases? Do the search phrases really contain commas at the end? If these are typos, try https://ideone.com/AyYaJh – Wiktor Stribiżew May 31 '21 at 20:09

1 Answers1

2

Take out the .*s -- matching too much data makes your logic more complicated than it should be for no good reason. Your regex doesn't need to be anchored, and you don't need re.MULTILINE here (since no individual match will ever span multiple lines).

import re

roster = ["name1", "name2", "name3"]
roster_re = re.compile(r'\b(' + '|'.join(re.escape(s) for s in roster) + r')\b')

with open("File.txt", 'r+') as f:
    new_content = roster_re.sub(lambda s: ('!'+s.group(0)), f.read())
    # Note that this is not a safe way to rewrite a file in place; may corrupt data
    f.seek(0)
    f.truncate()
    f.write(new_content)

See How to safely write to a file? for discussion of the changes you'd need to make to avoid corrupting your data file if the script fails mid-operation (the system suffers an inopportune reboot, the file server it's writing to fails, etc).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Many thanks for the assistance! New to RegEx, and I still have a lot to learn! – Daniel Hutchinson May 31 '21 at 20:21
  • 2
    Note that regular expressions themselves _only_ do matching -- "how do I use regex to change something?" is a question that implies that you're not _just_ using regexes; in this case, you're using extra functionality `re.sub` offers that goes beyond the strict bounds of what a regular expression (in the formal mathematical/academic context) is understood to be able to do. – Charles Duffy May 31 '21 at 20:24
  • 2
    Also, if you're new to regular expressions, I _strongly_ recommend reading the paper https://swtch.com/~rsc/regexp/regexp1.html talking about how more modern regex implementations are often _worse_ than ones from decades ago. (That paper started a sea change, so things have started to get better again, but it's worth reading even so, to help understand some of the context about _why_ regular expression libraries are changing and some functionality that was added by the Perl community is being phased back out, at least in part). – Charles Duffy May 31 '21 at 20:26