re.sub doesnt delete the pattern in a txt file

Question

with open("C:\code\code\Music Bot\lyrics.txt", "w+") as f:
    f.write(re.sub("\[.*\]", "", f.read()))
f.close()

The code opens a txt file of song lyrics, and it should delete any section that has words inside of [], E.G. [verse]. i've tried to put these lines of code in different places in the entire code but nothing changes, ive checked with other people and the pattern stated also seems to be fine. any ideas?

What is the content of `lyrics.txt`? Also, `f.close()` is not necessary; the context manager already does that. — InSync, Aug 22 '23 at 16:48
opening a file in "w+" mode deletes everything in the file, so there's not going to be anything to read. — Wooble, Aug 22 '23 at 16:48
You need to use a raw string, otherwise `\[` becomes just `[`. — Barmar, Aug 22 '23 at 16:49
@Barmar `\[` is not a valid escape sequence and will be resolved as two literal characters: `len('\[') # 2`. — InSync, Aug 22 '23 at 16:50
@InSync Thanks. Confused it with other languages where non-escape sequences simply become the following character. — Barmar, Aug 22 '23 at 16:52
@InSync, ...which is bad practice to rely on regardless. (If I were BDFL, invalid escape sequences would trigger errors -- that way you could add _more_ escape sequences in the future without breaking preexisting code; as it is, right now any change to the set of escape sequences is a compatibility break). — Charles Duffy, Aug 22 '23 at 16:52
Best to use a raw string anyway; in current Python using an unrecognized escape sequence generates a SyntaxWarning and in some future version it will be an error (although I believe that can't happen until Python 4, and I'm not sure I believe that will ever exist...) — Wooble, Aug 22 '23 at 16:53
@CharlesDuffy I never said it was good. Barmar closed this question based on an incorrect conclusion which I pointed out and that was it. Nevertheless, it's still a ridiculous quirk. — InSync, Aug 22 '23 at 16:55
Ultimately, is your question about a regex pattern that will identify text inside of square brackets or is your question about how to read and write to the same file? — JonSG, Aug 22 '23 at 16:58

score 3 · Accepted Answer · answered Aug 22 '23 at 16:58

3

You're writing the modified version of the file at the end, not the beginning. So the file will contain the original text, followed by the text with the bracketed text removed. You need to seek back to the beginning before writing, and truncate the file after writing.

You should use r+ mode when opening the file, to read before writing. w+ empties the file when it opens it, it's used for writing before reading.

You also should use a non-greedy quantifier. With greedy .*, it will remove everything from the first [ to the last ].

with open(r"C:\code\code\Music Bot\lyrics.txt", "r+") as f:
    contents = f.read()
    contents = re.sub(r"\[.*?\]", "", contents)
    f.seek(0)
    f.write(contents)
    f.truncate()

Use raw strings for pathnames and regular expressions, so the backslashes will be treated literally.

answered Aug 22 '23 at 16:58

Barmar

741,623
53
500
612

I'd just open the file in "r" mode and then open it again in "w" mode separately; saves messing about with needing to remember the .seek(0) and .truncate(). But I totally missed the greedy regexp in the answer I'd written half of so just leaving this as a comment. – Wooble Aug 22 '23 at 17:00
I'm guess the OP might want to throw in some checks for spaces in front and behind and replace with a single space otherwise they are going to potentially end up with two spaces following this sub. – JonSG Aug 22 '23 at 17:01
I'd probably do it that way, too. – Barmar Aug 22 '23 at 17:01

re.sub doesnt delete the pattern in a txt file

1 Answers1