How can I change the output of this function into a dictionary?

Question

I have written a function containing regex to separate some special parts of a txt file. The code works fine but I would like to get a dictionary as an output from this and the length should be 979:

import re

def logs():
    with open("C:/Users/ASUS/Desktop/logdata.txt", "r") as file:
        logdata = file.read()

    pattern = ''' 
    (?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})    # host name
    \s+\S+\s+
    (?P<user_name>(?<=-\s)(\w+|-)(?=\s))\s+\[   # user_name
    (?P<time>([^[]+))\]\s+"                     # time
    (?P<request>[^"]+)"                         # request
    '''

    for item in re.finditer(pattern, logdata, re.VERBOSE):
        print(item.groupdict())

This function is supposed to turn a text like this:

146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622

to this capturing host, user_name etc:

{"host":"146.204.224.152", 
 "user_name":"feest6811", 
 "time":"21/Jun/2019:15:45:24 -0700",
 "request":"POST /incentivize HTTP/1.1"}

How can I do this?

@mkrieger1 thank you for your answer. I just used `my_list = re.findall(pattern, logdata, re.VERBOSE)` instead of the for loop and it outputs a list. — Anoushiravan R, Jan 09 '22 at 22:34
Your code says `print(item.groupdict())`, not `my_list = ...`. What does it output? — mkrieger1, Jan 09 '22 at 22:35
I edited the code just now. Check it know please. It returns a list. — Anoushiravan R, Jan 09 '22 at 22:38
Why did you change the code to return a list, when you don't want a list? Why did you not keep the code that printed a dictionary if you want a dictionary? — mkrieger1, Jan 09 '22 at 22:39
Oh you are right. I made a mistake I will undo the modifications. — Anoushiravan R, Jan 09 '22 at 22:41
It would help if the function actually *returned* anything, rather than just writing something to standard output. — chepner, Jan 09 '22 at 22:47
@chepner It can be modified but the output should be a dictionary with the format I try to capture with regex. — Anoushiravan R, Jan 09 '22 at 23:04

dawg · Accepted Answer · 2022-01-10T00:14:30.177

Just use groupdict() directly:

import re 

def rtr_dict(txt):
  pattern = ''' 
  (?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})   # host name
  \s+\S+\s+
  (?P<user_name>(?<=-\s)(\w+|-)(?=\s))\s+\[   # user_name
  (?P<time>([^[]+))\]\s+"   # time
  (?P<request>[^"]+)"   # request
  '''
  
  if m:=re.match(pattern, txt, flags=re.VERBOSE):
    return m.groupdict()

tgt='146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'


>>>rtr_dict(tgt)
{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}

Just could you please tell me how I could make it for more than just one line just the way I used a for loop for that.

Given:

tgt='''146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
146.204.224.153 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4623
146.204.224.154 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4624'''

If you have more than one match, you can return a list of dicts:

def rtr_dict(txt):
  pattern = ''' 
  (?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})   # host name
  \s+\S+\s+
  (?P<user_name>(?<=-\s)(\w+|-)(?=\s))\s+\[   # user_name
  (?P<time>([^[]+))\]\s+"   # time
  (?P<request>[^"]+)"   # request
  '''
  
  return [m.groupdict() for m in re.finditer(pattern, txt, flags=re.VERBOSE)]

>>> rtr_dict(tgt)
[{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '146.204.224.153', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '146.204.224.154', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}]

Or use a generator:

def rtr_dict(txt):
  pattern = ''' 
  (?P<host>\d{1,}\.\d{1,}\.\d{1,}\.\d{1,})   # host name
  \s+\S+\s+
  (?P<user_name>(?<=-\s)(\w+|-)(?=\s))\s+\[   # user_name
  (?P<time>([^[]+))\]\s+"   # time
  (?P<request>[^"]+)"   # request
  '''
  
  for m in re.finditer(pattern, txt, flags=re.VERBOSE):
    yield m.groupdict()

>>> list(rtr_dict(tgt))
# same list of dicts...

Thank you very much. The output is exactly what I was looking for. Just could you please tell me how I could make it for more than just one line just the way I used a for loop for that. — Anoushiravan R, Jan 09 '22 at 23:59
This is perfect. Thank you very much indeed. I am very new to this concepts. I would highly appreciate it if you could add some notes in list comprehension part. I will learn about them however in time. Thank you again :) — Anoushiravan R, Jan 10 '22 at 00:19

score 1 · Answer 2 · answered Jan 10 '22 at 02:17

Its late but this Verbose regex will also do (to return a list of dictionaries]

import re
def logs():
    with open("C:/Users/ASUS/Desktop/logdata.txt", "r") as file:
        logdata = file.read()
    
    pattern = """
    (?P<host>[\d\.]*)       #IP host
    (\ -\ )                 #followed by 
    (?P<user_name>[\w-]*)   #user name
    (\ *\[)                 #followed by 
    (?P<time>[^\]]*)        #time
    (\]\ *")                #followed by 
    (?P<request>[^\"]*)     #request"""

    return [item.groupdict() for item in re.finditer(pattern, logdata, re.VERBOSE)]

How can I change the output of this function into a dictionary?

2 Answers2