Regex replace Big file

Question

So I have a big file of 3000 lines. I need to find the first occurence of $SETGLOBAL and I need to change the word after the first occurence of this word. To do this I use the following regex expression

with open("textfile.txt","r") as F:
      FF=F.read()
FF=re.sub("\$SETGLOBAL\s(.*)", FF ,"CCCC",1)
F2 = open("textfile.txt","w").write(FF)

The problem is that in order to change the text in my huge file I also need regex to capture everything before and after this occurrence. So I can write the new text file with the changed word in it.

How would I do this?

My problem is that I need the entire file in my variable FF. so I can write it to a new file.

Imagine I have for example the following file :

123456
$SETGLOBAL AAAA
BBBBBB
$SETGLOBAL TTTT

What I need is a new file as following

123456
$SETGLOBAL CCCC
BBBBBB
$SETGLOBAL TTTT

But my solution overwrites everything and I am left with only

$SETGLOBAL CCCC

in my new file

Do you mean you just need `FF=re.sub(r"\$SETGLOBAL\b", text ,aa, 1)`? Replace something only once? — Wiktor Stribiżew, Jul 11 '18 at 09:08
There is really no point consuming all the text after a word unless you are doing that in Notepad++ that has its own quirks. Please clarify what you mean by *the problem is that then the group that I need changes if the text before the first occurence changes so my code wouldn't work anymore.* Some example would help. — Wiktor Stribiżew, Jul 11 '18 at 09:11
yes I only need to change it once, so the 1 in the end does exactly that. I edit my question to include it. My problem is the following : I need a new file with all of the text of the previous file and the changed word. When I do it with re.sub the entire file gets overwritten and all that is left is the substituted text. — David, Jul 11 '18 at 09:24
I edited the question to show it is not a duplicate, and to answer your question. — David, Jul 11 '18 at 09:34
Try `re.sub(r"(\$SETGLOBAL\s+)\w+", r"\1{}".format(text), aa, 1)` — Wiktor Stribiżew, Jul 11 '18 at 09:44
That also deletes everything else in the file and I am left with : $SETGLOBAL $SETGLOBAL CCCC — David, Jul 11 '18 at 09:47
See https://ideone.com/bLk2dH, you are not using what I suggest. — Wiktor Stribiżew, Jul 11 '18 at 09:49
My bad, what you are saying works perfectly! thank you so much. Where can I read a bit more about your solution to understand how it works? — David, Jul 11 '18 at 09:52

Wiktor Stribiżew · Accepted Answer · 2018-07-11T10:00:32.980

You may capture the left-hand context in a capturing group and just match any word to later replace with a backreference to the group value and another word:

import re
aa='''123456
$SETGLOBAL AAAA
BBBBBB
$SETGLOBAL TTTT'''
text="CCCC"
print(re.sub(r"(\$SETGLOBAL\s+)\w+", r"\1{}".format(text), aa, 1))
# or
# print(re.sub(r"(\$SETGLOBAL\s+)\S+", r"\1{}".format(text), aa, 1))

See the Python demo

Here,

(\$SETGLOBAL\s+)\w+ - matches and captures $SETGLOBAL and any 1+ whitespace chars into Group 1 (later referenced with \1 from the replacement pattern), and just matches 1 or more word chars with \w+ (\w matches any letters, digits or _ chars (and some more Unicode chars if you use it in Python 3, or when using re.U flag in Python 2.x). NOTE: \S+ will match 1 or more any non-whitespace chars.
\1 - is the backreference to the value stored in Group 1 buffer

The 1 argument is the limit argument, it tells re.sub to only replace once, the first match.

Regex replace Big file

1 Answers1