0

**This is my python code, I'm trying to convert NGINX logs.

I'm reading logs from access.log file and using regular expressions to convert it into JSON format and i need to upload these logs to Elasticseach. Please also guide related to that. I'm new into both**

 import json 
 import re

 i = 0
 result = {}

with open('access.log') as f:
  lines = f.readlines()


regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

for line in lines:

  r = re.match(regex,line)

  if len(r) >= 6:
    result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status': 
                 r[3], 'Browser Info': r[4]}
    i += 1
 print(result) 

with open('data.json', 'w') as fp:
 json.dump(result, fp)

I'm facing the following error

Traceback (most recent call last):
   File "/home/zain/Downloads/stack.py", line 17, in <module>
    if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()

These are log format

127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

Expected output is

IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14  HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437  Browser Info: "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
cigien
  • 57,834
  • 11
  • 73
  • 112
john
  • 5
  • 3
  • @barmar kindly guide me related to this – john May 26 '22 at 21:22
  • it doesn't look like 'r' has assigned value from the regex. What do you get if you print(r)? – Captain Caveman May 26 '22 at 21:28
  • @CaptainCaveman it shows nothing and same output error as mentioned in questioned – john May 26 '22 at 21:32
  • Yes, that is why if len(r) >= 6 is returning an error. You can't check the len() of something doesn't have a value. So, which part of each line in the log are you trying to extract with the regex? – Captain Caveman May 26 '22 at 21:33
  • @CaptainCaveman i'm trying to extract everything except blank spaces in between them and also need to label them – john May 26 '22 at 21:33
  • You're getting this error, because `re.match(regex,line)` hasn't returned a match (`type(r) == NoneType`). But the same line will get you a similar error, even when you do get a match, since r will then be a `match object` (see [docs](https://docs.python.org/3/library/re.html#match-objects)), and those have no `len()` either. – ouroboros1 May 26 '22 at 21:33
  • @ouroboros1 what to do in this case? how to get desired output? excuse my dumbness, i'm new in coding – john May 26 '22 at 21:34
  • You need to fix your regex. which part of each line in the log are you trying to extract with the regex? – Captain Caveman May 26 '22 at 21:37
  • I need 1) IP Address 2) Time Stamp 3) HTTP Request 4) 200 3437 5) Browser info – john May 26 '22 at 21:39
  • What you listed in your numbered list is everything in the line. haha. If the regex you posted works, you should be good to go. – Captain Caveman May 26 '22 at 21:47
  • Could you concretely describe what your expected output is for the given log file? – BrokenBenchmark May 26 '22 at 22:24
  • 1
    Welcome to Stack Overflow! Please don't vandalize your posts. By posting on the Stack Exchange network, you've granted a non-revocable right, under the [CC BY-SA 4.0 license](https://creativecommons.org/licenses/by-sa/4.0/), for Stack Exchange to distribute that content (i.e. regardless of your future choices). By Stack Exchange policy, the non-vandalized version of the post is the one which is distributed, and thus, any vandalism will be reverted. If you want to know more about deleting a post please see: [How does deleting work?](/help/what-to-do-instead-of-deleting-question). – cigien May 28 '22 at 01:17

1 Answers1

1

I took my cue from this code. Believe the following should do it:

import json 
import re

i = 0
result = {}

with open('access.log') as f:
    lines = f.readlines()

regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'

for line in lines:

    r = re.match(regex,line)
    
    if r != None:
        result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'), 
                     'HTTP status': r.group('httpstatus'), 'Return status': 
                     r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
        i += 1
    
print(result)

with open('data.json', 'w') as fp:
    json.dump(result, fp) 

Result (print(json.dumps(result, sort_keys=False, indent=4))):

{
    "0": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET / HTTP/1.1",
        "Return status": "200 3437",
        "Browser Info": "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "1": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "2": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /favicon.ico HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
}
ouroboros1
  • 9,113
  • 3
  • 7
  • 26
  • Thank you it worked. Do you know how can i upload it to elasticsearch? – john May 26 '22 at 23:51
  • I'm not familiar with elasticsearch, but you should probably be able to find the answer here on SO already, E.g. [this post](https://stackoverflow.com/questions/15936616/import-index-a-json-file-into-elasticsearch)? – ouroboros1 May 27 '22 at 00:00