There are a number of considerations to address about this:
- If your file is large, it isn't a good idea to load it all in memory.
- If some exception occurs during processing (maybe even a
KeyboardInterrruptException
), it is often desirable to leave your original file untouched (so, we'll try to make your operation ACID).
- If multiple concurrent processes try to modify your file, you would like some guarantee that, at least, yours is safe (also ACID).
- You may (or may not) want a backup for your file.
There are a number of possibilities (see e.g. this SO question). In my experience however, I got mixed results with fileinput
: it makes it easy to modify one or several files in place, optionally creating a backup for each, but unfortunately it writes eagerly in each file (possibly leaving it incomplete when encountering an exception). I put an example at the end for reference.
What I've found to be the simplest and safest approach is to use a temporary file (in the same directory as the file you are processing and named uniquely but in a recognizable manner), do your operation from src
to tmp
, then mv tmp src
which, at least for practical purposes, is atomic on most POSIX filesystems.
def acceptall(line):
return True
def filefilter(filename, filterfunc=acceptall, backup=None):
if backup:
backup = f'{filename}{backup}' # leave None if no backup wanted
tmpname = tempfile.mktemp(prefix=f'.{filename}-', dir=os.path.dirname(filename))
with open(tmpname, 'w') as tmp, open(filename, 'r') as src:
for line in src:
if filterfunc(line):
tmp.write(line)
if backup:
os.rename(filename, backup)
os.rename(tmpname, filename)
Example for your case:
filefilter('wappoint.txt.txt', lambda line: email not in line)
Using a regex to exclude multiple email addresses (case-insensitive and only fully matching), and generating a .bak
backup file:
matcher = re.compile(r'.*\b(bob|fred|jeff)@foo\.com\b', re.IGNORECASE)
filefilter(filename, lambda line: not matcher.match(line), backup='.bak')
We can also simulate what happens if an exception is raised in the middle (e.g. on the first matching line):
def flaky(line):
if email in line:
1 / 0
return True
filefilter(filename, flaky)
That will raise ZeroDivisionError
upon the first matching line. But notice how your file is not modified at all in that case (and no backup is made). As a side-effect, the temporary file remains (this is consistent with other utils, e.g. rsync
, that leave .filename-<random>
incomplete temp files at the destination when interrupted).
As promised, here is also an example using fileinput
, but with the caveats explained earlier:
with fileinput.input(filename, inplace=True, backup='.bak') as f:
for line in f:
if email not in line:
print(line, end='') # this prints back to filename