1

So I have a big file of 3000 lines. I need to find the first occurence of $SETGLOBAL and I need to change the word after the first occurence of this word. To do this I use the following regex expression

with open("textfile.txt","r") as F:
      FF=F.read()
FF=re.sub("\$SETGLOBAL\s(.*)", FF ,"CCCC",1)
F2 = open("textfile.txt","w").write(FF)

The problem is that in order to change the text in my huge file I also need regex to capture everything before and after this occurrence. So I can write the new text file with the changed word in it.

How would I do this?

My problem is that I need the entire file in my variable FF. so I can write it to a new file.

Imagine I have for example the following file :

123456
$SETGLOBAL AAAA
BBBBBB
$SETGLOBAL TTTT

What I need is a new file as following

123456
$SETGLOBAL CCCC
BBBBBB
$SETGLOBAL TTTT

But my solution overwrites everything and I am left with only

$SETGLOBAL CCCC

in my new file

David
  • 129
  • 1
  • 11
  • Do you mean you just need `FF=re.sub(r"\$SETGLOBAL\b", text ,aa, 1)`? Replace something only once? – Wiktor Stribiżew Jul 11 '18 at 09:08
  • There is really no point consuming all the text after a word unless you are doing that in Notepad++ that has its own quirks. Please clarify what you mean by *the problem is that then the group that I need changes if the text before the first occurence changes so my code wouldn't work anymore.* Some example would help. – Wiktor Stribiżew Jul 11 '18 at 09:11
  • yes I only need to change it once, so the 1 in the end does exactly that. I edit my question to include it. My problem is the following : I need a new file with all of the text of the previous file and the changed word. When I do it with re.sub the entire file gets overwritten and all that is left is the substituted text. – David Jul 11 '18 at 09:24
  • I edited the question to show it is not a duplicate, and to answer your question. – David Jul 11 '18 at 09:34
  • Try `re.sub(r"(\$SETGLOBAL\s+)\w+", r"\1{}".format(text), aa, 1)` – Wiktor Stribiżew Jul 11 '18 at 09:44
  • That also deletes everything else in the file and I am left with : $SETGLOBAL $SETGLOBAL CCCC – David Jul 11 '18 at 09:47
  • See https://ideone.com/bLk2dH, you are not using what I suggest. – Wiktor Stribiżew Jul 11 '18 at 09:49
  • My bad, what you are saying works perfectly! thank you so much. Where can I read a bit more about your solution to understand how it works? – David Jul 11 '18 at 09:52
  • I will post an answer – Wiktor Stribiżew Jul 11 '18 at 09:53

1 Answers1

1

You may capture the left-hand context in a capturing group and just match any word to later replace with a backreference to the group value and another word:

import re
aa='''123456
$SETGLOBAL AAAA
BBBBBB
$SETGLOBAL TTTT'''
text="CCCC"
print(re.sub(r"(\$SETGLOBAL\s+)\w+", r"\1{}".format(text), aa, 1))
# or
# print(re.sub(r"(\$SETGLOBAL\s+)\S+", r"\1{}".format(text), aa, 1))

See the Python demo

Here,

  • (\$SETGLOBAL\s+)\w+ - matches and captures $SETGLOBAL and any 1+ whitespace chars into Group 1 (later referenced with \1 from the replacement pattern), and just matches 1 or more word chars with \w+ (\w matches any letters, digits or _ chars (and some more Unicode chars if you use it in Python 3, or when using re.U flag in Python 2.x). NOTE: \S+ will match 1 or more any non-whitespace chars.
  • \1 - is the backreference to the value stored in Group 1 buffer

The 1 argument is the limit argument, it tells re.sub to only replace once, the first match.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563