0

I'm working with a linux service that generates a log in JSON format in /var/log. The log file is almost-constantly increasing. Actually the service I'm working with does not have any database connector or wrapper to make the log go directly to a database so I'll have to work on my own service for parsing and sending.

Which will be the best way to constantly parse the file and upload new lines to the db?

Add: I don't want to use anything related to the ELK stack

Thanks!

Pau Muñoz
  • 67
  • 1
  • 6
  • What is a sample of what the log file looks like? Is it a stream of valid JSON values (e.g. `{...}{...}...`), or is it a single, incomplete value (`[{...}, {...},...`)? – chepner May 11 '18 at 17:51
  • I guess that the service is sending a single json object per line. Everything else wouldn't make sense. You just need to write a program which reads the file line by line, parses a json object out of every line and does whatever with it – hek2mgl May 11 '18 at 17:52
  • PS: Let me add that I personally feel that using json for log messages is wrong because it is super inefficient but I know that some folks are doing that nowadays. – hek2mgl May 11 '18 at 17:55
  • Btw, you might want to have a look at [fluentd](https://www.fluentd.org/). The project aims to provide what you have in mind. – hek2mgl May 11 '18 at 18:00

1 Answers1

2

To read the file it's like a tail command, I do a little script:

logtodb.py

import json
import os
import time


def tail(stream_file):
    """ Read a file like the Unix command `tail`. Code from https://stackoverflow.com/questions/44895527/reading-infinite-stream-tail """
    stream_file.seek(0, os.SEEK_END)  # Go to the end of file

    while True:
        if stream_file.closed:
            raise StopIteration

        line = stream_file.readline()

        yield line


def log_to_db(log_path, db):
    """ Read log (JSON format) and insert data in db """
    with open(log_path, "r") as log_file:
        for line in tail(log_file):
            try:
                log_data = json.loads(line)
            except ValueError:
                # Bad json format, maybe corrupted...
                continue  # Read next line

            # Do what you want with data:
            # db.execute("INSERT INTO ...", log_data["level"], ...)
            print(log_data["message"])

And a test file:

test_logtodb.py

import random
import json
import time
import threading
import logtodata


def generate_test_json_log(log_path):
    with open(log_path, "w") as log_file:
        while True:
            log_data = {
                "level": "ERROR" if random.random() > 0.5 else "WARNING",
                "message": "The program exit with the code '{0}'".format(str(int(random.random() * 200)))
            }

            log_file.write("{0}\n".format(
                json.dumps(log_data, ensure_ascii=False)))
            log_file.flush()
            time.sleep(0.5)  # Sleep 500 ms


if __name__ == "__main__":
    log_path = "my-log.json"
    generator = threading.Thread(
        target=generate_test_json_log, args=(log_path,))
    generator.start()

    logtodata.log_to_db(log_path, db=None)

I assume the log file look like:

{"level": "ERROR", "message": "The program exit with the code '181'"}
{"level": "WARNING", "message": "The program exit with the code '51'"}
{"level": "ERROR", "message": "The program exit with the code '69'"}

I can help you to update my script if it's not the right format

Martin
  • 161
  • 1
  • 7
  • I didn't carefully review the code in the proposed solution but the basic idea is correct. Whether this should be enhanced to record the last parsed offset and seek to that on restart is something only the person asking the question can answer. – Kurtis Rader May 12 '18 at 04:06