0

I have a zipped (.gz) log file logfile.20221227.gz. I am writing a python script to process it. I did a test run with file which had 100 lines and the script worked fine. When i ran the same script on actual log file which is almost 5GB the script gets broken. Note that i was able to process log files upto 2GB. Unfortunately the only log file heavier than this is 5GB+ or 7GB+ and the script fails with both of them. My code is as below.

count = 0
toomany = 0 
maxhits = 5000
logfile = '/foo/bar/logfile.20221228.gz'
with gzip.open(logfile, 'rt', encoding='utf-8') as page:
    for line in page:
        count += 1
        print("\nFor loop count is: ",count)
        string = line.split(' ', 5)
        if len(string) < 5:
            continue
        level = string[3]
        shortline = line[0:499]
        if level == 'FATAL':
            log_lines.append(shortline)
            total_fatal += 1
        elif level == 'ERROR':
            log_lines.append(shortline)
            total_error += 1
        elif level == 'WARN':
            log_lines.append(shortline)
            total_warn += 1
        if not toomany and (total_fatal + total_error + total_warn) > max_hits:
            toomany = 1
if len(log_lines) > 0:
    send_report(total_fatal, total_error, total_warn, toomany, log_lines, max_hits)

Output:

For loop count is:  1
.
.
For loop count is:  192227123    
Killed

What does the Killed means here? It does not offer much to investigate just with this one keyword. Also is there a limit on file size and is there a way to bypass it.

Thank you.

tturbo
  • 680
  • 5
  • 16
data-bite
  • 417
  • 2
  • 5
  • 17
  • Is this the full code? What is the goal of that? – tturbo Dec 28 '22 at 07:18
  • @tturbo thanks for quick response. This is not full code actually. but remaining code looks irrelevant. I am just using if statements inside this to check if the log line is INFO/ERROR/WARNING and then email this as a report with count and at max 500 lines. – data-bite Dec 28 '22 at 07:22
  • What OS are you running this on? What are you doing with *level* and *shortline*. Do you have any processes that might send SIGKILL to the Python process under certain circumstances? Might be an idea to show the **actual** code rather than a possibly irrelevant fragment – DarkKnight Dec 28 '22 at 07:24
  • @Fred It is running on centos7 with Python 2.7.5. I have also added missing code. I have not included the send_report function here which just sends a mail. – data-bite Dec 28 '22 at 07:32
  • You may want to share all the code, what looks irrelevant to you may be the problem. from the code I see, I can't see a problem. or have you tried to run the code as it is above and experienced the same problem? – tturbo Dec 28 '22 at 07:33
  • also, is there nothing on the error stream? is "Killed" the only output? No error stack trace or something else? – tturbo Dec 28 '22 at 07:34
  • @tturbo i have shared the code except the send_report function. I think the fucntion code is irrelevant because the script breaks after the count shown in output and never reaches to the send_report function. Yes the only output except the printing the count lines is "Killed", nothing else. – data-bite Dec 28 '22 at 07:38
  • 1
    @tturbo Python will report "killed" on some (if not all) systems if SIGKILL is sent to the Python interpreter – DarkKnight Dec 28 '22 at 07:38
  • @data-bite Why are you using an unsupported version of Python? – DarkKnight Dec 28 '22 at 07:41
  • 1
    Your output count is 192227123. Each line appended to the log_lines list could be up to 498 bytes. Therefore you could need up to ~90GB RAM. Do you have that much memory? – DarkKnight Dec 28 '22 at 08:04
  • What's the problem, read by 100 lines and send? – Сергей Кох Dec 28 '22 at 08:13

1 Answers1

1

From the updated code about, it may is a memory problem because log_lines gets to big

try to write shortline to a temporary file rather than log_lines.append, then in the end send the file (or its content) via email.

But check first how big the file is, because it may gets to big to be send via email. You can then try to zip it. You may also want to write the temp file as gz directly:

import gzip
with gzip.open('./log_lines.txt.gz', 'wb') as log_lines:
    with gzip.open(logfile, 'rt', encoding='utf-8') as page:
        # ...
        log_lines.write(shortline)
tturbo
  • 680
  • 5
  • 16