-1

I try to parse this log into pandas dataframe. I need to use regex to parse this log to a list/dataframe with python, thanks

127.0.0.1 - - [05/Feb/2012:17:11:55 +0000] "GET / HTTP/1.1" 200 140 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.5 Safari/535.19"

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined

This log is from Understanding Apache's access log

I have tried with splits and for loops but it is wierd. regex would be more efficient, do you know if it is possible and how ?

node = re.search(regex, log_line).group(1)
node = node.split(" ")
print(node)
TFo
  • 11
  • 1

1 Answers1

0
APACHE_LOG_PATTERN = '^(\S+) (\S+) (\S+) \[([\w:/]+\s[+\-]\d{4})\] \"(\S+) (\S+)\s*(\S+)?\s*\" (\d{3}) (\S+)'
match = re.search(APACHE_LOG_PATTERN, l)
host          = match.group(1)
client_id     = match.group(2)
user_id       = match.group(3)
date_time     = match.group(4)#%Y/%m/%d:%I:%M:%S +0100
method        = match.group(5)
endpoint      = match.group(6)
protocol      = match.group(7)
response_code = int(match.group(8))
content_size  = match.group(9)
taki
  • 109
  • 6