0

I have a log file that has the format as follows:

Nov 28 06:26:45 server-01 dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1 
Nov 28 06:26:45 server-01 dhcpd: DHCPOFFER on 10.39.255.253 to cc:d3:e2:7a:af:40 via 10.39.192.1

The next step is to convert the text data into a JSON using Python. So far, I have the python script. Now, the JSON file is created in the following format:

# Python program to convert text
# file to JSON

import json

# the file to be converted
filename = 'Logs.txt'

# resultant dictionary
dict1 = {}

# fields in the sample file
fields =['timestamp', 'Server', 'Service', 'Message']

with open(filename) as fh:
    # count variable for employee id creation
    l = 1

    for line in fh:
        # reading line by line from the text file
        description = list( line.strip().split(None, 4))

        # for output see below
        print(description)

        # for automatic creation of id for each employee
        sno ='emp'+str(l)

        # loop variable
        i = 0
        # intermediate dictionary
        dict2 = {}
        while i<len(fields):

                # creating dictionary for each employee
                dict2[fields[i]]= description[i]
                i = i + 1

        # appending the record of each employee to
        # the main dictionary
        dict1[sno]= dict2
        l = l + 1

# creating json file
out_file = open("test5.json", "w")
json.dump(dict1, out_file, indent = 4)
out_file.close()

which gives the following output:

{
 "emp1": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" },
 "emp2": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }
}

But I need an ouput like:

{
"timestamp":"Nov 28 06:26:26", 
"Server":"server-01", 
"Service":"dhcpd",
"Message":"DHCPOFFER on 10.45.45.31 to cc:d3:e2:7a:b9:6b via 10.45.0.1",
}

I don't know why it's not printing the whole data. Can anyone help me with this?

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • What is the output which you are getting? – Jay Dec 01 '22 at 11:15
  • Here is my output: "emp1": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }, "emp2": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }, – Abhiyantraka Dec 01 '22 at 14:02
  • Could you please put a log sample as text and not as image, so we can copy paste? – Bijay Regmi Dec 01 '22 at 17:02
  • Here is my log file: Nov 28 06:26:45 server-01 dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1 Nov 28 06:26:45 server-01 dhcpd: DHCPOFFER on 10.39.255.253 to cc:d3:e2:7a:af:40 via 10.39.192.1 – Abhiyantraka Dec 02 '22 at 06:06
  • @Abhiyantraka as you can see, comments don't format code very well, so you should [edit] your question to add large blocks of code, as I just did. – Pranav Hosangadi Dec 13 '22 at 21:41

1 Answers1

0

The problem with your code is that you did .split(None, 4), which allows only 4 splits on the input string. Since the date contains spaces too, the result of this will be (e.g. for the first line of your input):

['Nov',         # timestamp
 '28',          # Server
 '06:26:45',    # Service
 'server-01',   # Message
 'dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']

You even printed this, so I'm surprised you didn't notice something is wrong.

Now, the first element of the list is assigned to the key 'timestamp', the second element to the key 'Server', and so on. This is how you get a dict that looks like:

{ "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }

Instead, you want to split a maximum of five times. The first three elements of the resultant split are the timestamp.

# Don't need that extra list(), since .split() already returns a list
description = line.strip().split(None, 5) 

# Join the first three elements,
joined_timestamp = " ".join(description[:3])

# and replace them in the list
# Setting a slice of a list: See https://stackoverflow.com/q/10623302/843953
description[:3] = [joined_timestamp]

Then, your description looks like this:

['Nov 28 06:26:45',
 'server-01',
 'dhcpd:',
 'DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']

and the elements fields now correspond to the values in description.

Note that you could replace that entire while i < len(fields)... loop with simply dict2 = dict(zip(fields, description))

P.S.: You might want to clean up other elements of description, such as description[2] = description[2].rstrip(":") to remove the trailing colon in 'dhcpd:'

Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70