0

I wrote a program which searches for the oldest logs, and then I want to check the logs, if there have for example logs from the date "Jul 30 22:40". I would like to delete these logs. But i did not find something like this here or somewhere else. Could you maybe help me?

var = subprocess.Popen('find /var/log/syslog* -mtime +%i' % specific_delete_range, stderr=subprocess.PIPE, stdout=subprocess.PIPE, shell=True)
out, err = var.communicate()
out = out.decode('ascii')

for line in out.split():
    firstresult.append(line)

for element in firstresult:
    with gzip.open(element, 'rb') as f:
        for line in f:
            if my_str_as_bytes in line:
                rightlines.append(line)

So the lines, which are in the list "rightlines" , should be deleted.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Naomi
  • 1
  • I'm not sure I understand. If you delete lines while other program are accessing the file, the other programs may be confused. In general we try to avoid such problems: you may want to edit a copy of the file, and then move the copy over the original file. – Giacomo Catenazzi Aug 06 '20 at 08:03

2 Answers2

0

It is not possible to 'delete lines' in the middle of the file. Even if this was possible for regular file, it will not be possible to do it for compressed file because the compress file is composed of 'blocks', and it is very likely that blocks will not be aligned on line boundaries.

As an alternative, consider extracting the content to be left in the file into new file, and then renaming the new file to override the old file.

The following bash script look for the pattern "P" in zipped log files, and replace the content with a new file that doe not have lines with the pattern "P".

Note: The script will not handle uncompressed file (similar to the way the OP script works). The pattern /var/log/syslog* was modified to select only compressed files (/var/log/syslog*.gz). This may need adjustment based on actual suffix used for compressed files.

days=30   # Change to whatever file age
P="Jul 30 22:40"    # Pattern to remove
P=
for file in $(zfgrep -l "$P" $(find /var/log/syslog*.gz -mtime +$days)) ; do
    # Extract content, re-compress and overwrite old files
    zfgrep -v "$P" $file | gzip > $file.new && mv $file.new $file
done
dash-o
  • 13,723
  • 1
  • 10
  • 37
0

In some sense doing this in Python is mildly crazy when it's so much easier to do succinctly in shell script. But here is a go at refactoring your code.

You generally should avoid subprocess.Popen() if you can; your code would be easier and more idiomatic with subprocess.run(). But in this case, when find can potentially return a lot of matches, we might want to process the files as they are reported, rather than wait for the subprocess to finish and then collect its output. Using code from this Stack Overflow answer, and adapting in accordance with Actual meaning of 'shell=True' in subprocess to avoid the shell=True, try something like

#!/usr/bin/env python3
from subprocess import Popen, PIPE
import gzip
from tempfile import NamedTemporaryFile
import shutil
import os


with Popen(
        ['find' '/var/log', '--name=syslog*', '-mtime', '+' +  specific_delete_range],
        stdout=PIPE, bufsize=1, text=True) as p:
    for filename in p.stdout:
        filename = filename.rstrip('\n')
        temp = NamedTemporaryFile(delete=False)
        with gzip.open(filename, 'rb') as f, gzip.open(temp, 'wb') as z:
            for line in f:
                if my_str_as_bytes not in line:
                    z.write(line)
        os.unlink(filename)
        shutil.copy(temp, filename)
        os.unlink(temp)

With text=True we don't have to decode the output from Popen. The lines from gzip are still binary bytes; we could decode them, of course, but instead encoding the search string into bytes, as you have done, is more efficient.

The beef here is using a temporary file for the filtered result, and then moving it back on top over the original file once we are done writing it.

NamedTemporaryFile has some sad quirks on Windows, but lucky for you, you are not on Windows.

tripleee
  • 175,061
  • 34
  • 275
  • 318