1

Given a string array of logs:

log = [
    '[WARNING] 403 Forbidden: No token in request parameters',
    '[ERROR] 500 Server  Error: int is not subscription',
    '[INFO] 200 OK: Login Successful',
    '[INFO] 200 OK: User sent a message',
    '[ERROR] 500 Server Error: int is not subscription'
]

I'm trying to get better with using dictionaries in python and want to loop through this array and print out something like this:

{'WARNING': {'403': {'Forbidden': {'No token in request parameters': 1}}},
'ERROR': {'500': {'Server Error': {'int is not subscriptable': 2}}},
'INFO': {'200': {'OK': {'Login Successful': 1, 'User sent a message': 1}}}}

Essentially, I want to return a dictionary with logging statistics formatted like above. I started writing out my method and wrote this so far:

def logInfo(logs):
    dct = {}

for log in logs:
    log = log.strip().split()
    if log[2] == "Server":
        log[2] = "Server Error:"
        log.remove(log[3])
    #print(log)
    joined = " ".join(log[3:])
    if log[0] not in dct:
        log[0] = log[0].strip('[').strip(']')
        dct[log[0]] = {}
        if log[1] not in dct[log[0]]:
            dct[log[0]][log[1]] = {}
            if log[2] not in dct[log[0]][log[1]]:
                dct[log[0]][log[1]][log[2]] = {}
                if joined not in dct:
                    dct[log[0]][log[1]][log[2]][joined] = 1
                else:
                    dct[log[0]][log[1]][log[2]][joined] += 1
            else:
                dct[joined].append(joined)
print(dct)

It prints this instead:

{'WARNING': {'403': {'Forbidden:': {'No token in request parameters': 1}}}, 'ERROR': {'500': {'Server Error:': {'int is not subscription': 1}}}, 'INFO': {'200': {'OK:': {'User sent a message': 1}}}}

The method itself is pretty long too, can anyone help and or maybe hint me to a more proficient way of handling this?

Fire Assassin
  • 59
  • 1
  • 8

2 Answers2

1

I walk through your code.found fix some bug,and it runs well.

  • first there is no need nested if,so I flat the if at the same level.because when you test if the dict has a key,if not exists you give it an empty dict under the key,so next if will work fine when there has a parent key.
  • you do test log[0] not in dct before strip('[').strip(']'),so you will always earse previous data,I fix it and point it below the code
  • I don't know why you test joined not in dct,you should test it in dct[log[0]][log[1]][log[2]],I fix it and point it below the code
def logInfo(logs):
    dct = {}

    for log in logs:
        log = log.strip().split()
        if log[2] == "Server":
            log[2] = "Server Error:"
            log.remove(log[3])
        #print(log)
        joined = " ".join(log[3:])

        log[0] = log[0].strip('[').strip(']')
        if log[0] not in dct:
            # this line should move to before in dct test
            # log[0] = log[0].strip('[').strip(']') 
            dct[log[0]] = {}
        if log[1] not in dct[log[0]]:
            dct[log[0]][log[1]] = {}
        if log[2] not in dct[log[0]][log[1]]:
            dct[log[0]][log[1]][log[2]] = {}
        # I did not know why test joined in the root dct
        # if joined not in dct:
        if joined not in dct[log[0]][log[1]][log[2]]:
            dct[log[0]][log[1]][log[2]][joined] = 1
        else:
            dct[log[0]][log[1]][log[2]][joined] += 1
    
    print(dct)
nay
  • 1,725
  • 1
  • 11
  • 11
  • Oh wow, thanks, essentially I checked joined in dct[log[0]][log[1]][log[2]] but it wasn't working probably because of the nested if statement. I understand the logic in where I was wrong now:) – Fire Assassin Jul 13 '21 at 02:23
0

You can use re.findall and collections.defaultdict:

import re, collections
r = collections.defaultdict(dict)
log = ['[WARNING] 403 Forbidden: No token in request parameters', '[ERROR] 500 Server Error: int is not subscription', '[INFO] 200 OK: Login Successful', '[INFO] 200 OK: User sent a message', '[ERROR] 500 Server Error: int is not subscription']
for i in log:
   a, b, c, d = map(str.strip, re.findall('(?<=\[)\w+(?=\])|(?<=\]\s)\d+|(?<=\d\s)[\w\s]+(?=:)|(?<=:)[\w+\s]+$', i))
   if b not in r[a]:
      r[a][b] = collections.defaultdict(dict)
   if c not in r[a][b]:
      r[a][b][c] = collections.defaultdict(int)
   r[a][b][c][d] += 1

Output:

defaultdict(<class 'dict'>, {'WARNING': {'403': defaultdict(<class 'dict'>, {'Forbidden': defaultdict(<class 'int'>, {'No token in request parameters': 1})})}, 'ERROR': {'500': defaultdict(<class 'dict'>, {'Server Error': defaultdict(<class 'int'>, {'int is not subscription': 2})})}, 'INFO': {'200': defaultdict(<class 'dict'>, {'OK': defaultdict(<class 'int'>, {'Login Successful': 1, 'User sent a message': 1})})}})

The result is a collections.defaultdict of collections.defaultdicts. If you just want pure dictionaries, you can use recursion to convert r:

def to_dict(d):
   return {a:to_dict(b) if not isinstance(b, int) else b for a, b in d.items()}

print(to_dict(r))

Output:

{'WARNING': {'403': {'Forbidden': {'No token in request parameters': 1}}}, 
'ERROR': {'500': {'Server Error': {'int is not subscription': 2}}}, 
'INFO': {'200': {'OK': {'Login Successful': 1, 'User sent a message': 1}}}}
Ajax1234
  • 69,937
  • 8
  • 61
  • 102
  • Thank you for the solution, do you mind directing me to a page where I can understand what these functions do exactly? or if you don't mind explaining them yourself? – Fire Assassin Jul 13 '21 at 01:50
  • @FireAssassin For an in-depth overview of `collections.default`, see [here](https://stackoverflow.com/questions/5900578/how-does-collections-defaultdict-work). Instead of manually parsing out the different log components, this solution uses [regular expressions](https://docs.python.org/3/library/re.html). Lastly, `to_dict` works under a recursive principle: the input dictionary is looped over, and if the value is a dictionary, `to_dict` is called again, but if the value is the numerical count, it is simply stored as-is. – Ajax1234 Jul 13 '21 at 02:02