I am trying to replace all lines of a certain format with a blanks in a file i.e. replace a line of number/number/number (like a date) and number:number (like a time) with "".
You can't use str.replace
to match a pattern or format, only a literal string.
To match a pattern, you need some kind of parser. For patterns like this, the regular expression engine built into the standard library as re
is more than powerful enough… but you will need to learn how to write regular expressions for your patterns. The reference docs and Regular Expression HOWTO are great if you already know the basics; if not, you should search for a tutorial elsewhere.
Anyway, here's how you'd do this (fixing a few other things along the way, most of them explained by Lego Stormtroopr):
import re
with open("old_text.txt") as old_file, open("new_text.txt", "w") as new_file:
for line in old_file:
cleaned_line = re.sub(r'\d+/\d+/\d+', '', line)
cleaned_line = re.sub(r'\d+:\d+', '', cleaned_line)
new_file.write(cleaned_line)
Also, note that I used cleaned_line
in the second sub
; just using line
again, as in your original code, means we lose the results of the first substitution.
Without knowing the exact definition of your problem, I can't promise that this does exactly what you want. Do you want to blank all lines that contain the pattern number/number/number, blank out all lines that are nothing but that pattern, blank out just that pattern and leave the rest of the line alone? All of those things are doable, and pretty easy, with re
, but they're all done a little differently.
If you want to get a little trickier, you can use a single re.sub
expression to replace all of the matching lines with blank lines at once, instead of iterating them one at a time. That means a slightly more complicated regexp vs. slightly simpler Python code, and it means probably better performance for mid-sized files but worse performance (and an upper limit) for huge files, and so on. If you can't figure out how to write the appropriate expression yourself, and there's no performance bottleneck to fix, I'd stick with explicit looping.