In some sense doing this in Python is mildly crazy when it's so much easier to do succinctly in shell script. But here is a go at refactoring your code.
You generally should avoid subprocess.Popen()
if you can; your code would be easier and more idiomatic with subprocess.run()
. But in this case, when find
can potentially return a lot of matches, we might want to process the files as they are reported, rather than wait for the subprocess to finish and then collect its output. Using code from this Stack Overflow answer, and adapting in accordance with Actual meaning of 'shell=True' in subprocess to avoid the shell=True
, try something like
#!/usr/bin/env python3
from subprocess import Popen, PIPE
import gzip
from tempfile import NamedTemporaryFile
import shutil
import os
with Popen(
['find' '/var/log', '--name=syslog*', '-mtime', '+' + specific_delete_range],
stdout=PIPE, bufsize=1, text=True) as p:
for filename in p.stdout:
filename = filename.rstrip('\n')
temp = NamedTemporaryFile(delete=False)
with gzip.open(filename, 'rb') as f, gzip.open(temp, 'wb') as z:
for line in f:
if my_str_as_bytes not in line:
z.write(line)
os.unlink(filename)
shutil.copy(temp, filename)
os.unlink(temp)
With text=True
we don't have to decode
the output from Popen
. The lines from gzip
are still binary bytes; we could decode them, of course, but instead encoding the search string into bytes, as you have done, is more efficient.
The beef here is using a temporary file for the filtered result, and then moving it back on top over the original file once we are done writing it.
NamedTemporaryFile
has some sad quirks on Windows, but lucky for you, you are not on Windows.