0

I have a problem from college that I am trying to solve. I have a log file, from which I want to extract just the HTTP codes.

I have included a bit of that log file below:

45.132.51.36 - - [19/Dec/2020:18:00:08 +0100] "POST /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 188 "-" "Mozilla/5.0(Linux;Android9;LM-K410)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.81MobileSafari/537.36" "-"
45.153.227.31 - - [19/Dec/2020:18:25:17 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/84.0.4147.125Safari/537.36Edg/84.0.522.59" "-"
194.156.95.52 - - [19/Dec/2020:18:27:18 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(Linux;Android10;PCT-L29)AppleWebKit/537.36(KHTML,likeGecko)Chrome/84.0.4147.125MobileSafari/537.36" "-"
45.132.207.221 - - [19/Dec/2020:19:43:45 +0100] "POST /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 188 "-" "Mozilla/5.0(Linux;Android5.1;HUAWEILYO-L21)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.0.3987.99MobileSafari/537.36" "-"
45.145.161.6 - - [19/Dec/2020:19:46:33 +0100] "POST /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 188 "-" "Mozilla/5.0(Linux;Android9;A3)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.81MobileSafari/537.36" "-"
83.227.29.211 - - [19/Dec/2020:19:54:04 +0100] "GET /images/stories/raith/wohnung_1_web.jpg HTTP/1.1" 200 80510 "http://almhuette-raith.at/index.php?option=com_content&view=article&id=49&Itemid=55" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36" "-"
87.247.143.30 - - [19/Dec/2020:20:00:43 +0100] "POST /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 188 "-" "Mozilla/5.0(WindowsPhone10.0;Android6.0.1;Microsoft;Lumia640LTE)AppleWebKit/537.36(KHTML,likeGecko)Chrome/52.0.2743.116MobileSafari/537.36Edge/15.15063" "-"
45.138.4.22 - - [19/Dec/2020:20:25:15 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/85.0.4183.83Safari/537.36/null/null/null" "-"
87.247.143.30 - - [19/Dec/2020:20:44:07 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/46.0.2486.0Safari/537.36Edge/13.10586" "-"
45.153.227.31 - - [19/Dec/2020:21:17:17 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(Linux;Android9;LYA-L29Build/HUAWEILYA-L29;wv)AppleWebKit/537.36(KHTML,likeGecko)Version/4.0Chrome/85.0.4183.81MobileSafari/537.36EdgW/1.0" "-"
45.144.0.98 - - [19/Dec/2020:21:25:42 +0100] "GET /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 9873 "-" "Mozilla/5.0(Linux;Android9;SAMSUNGSM-J330F)AppleWebKit/537.36(KHTML,likeGecko)SamsungBrowser/12.1Chrome/79.0.3945.136MobileSafari/537.36" "-"
45.132.207.221 - - [19/Dec/2020:21:39:00 +0100] "POST /index.php?option=com_contact&view=contact&id=1 HTTP/1.1" 200 188 "-" "Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/84.0.4147.125Safari/537.36" "-"

My code is below. I thought by limiting the numbers after .* it would work. I also tried adding a $ after the [0-9]{3}.

import re

with open("access.log") as file:
    contents = file.read()
    http_code = re.findall("HTTP/1.1\".* [0-9]{3}", contents)
    print(http_code)

What I can do just to extract the numeric HTTP codes after the HTTP/1.1"?

IAmAndy
  • 121
  • 2
  • 12
  • 1
    Lazy quantifier, `HTTP/1.1\".*? ([0-9]{3})`, see https://regex101.com/r/d3aCoN/1. also, Why bother matching any text, `HTTP/1.1\" ([0-9]{3})` seems to work, too. – Wiktor Stribiżew Nov 29 '21 at 21:42
  • 1
    `(?<= )\d{3}` or ` (\d{3})` appears to do the job as well. The first matches three digits preceded by a space (`(?<= )` being a *positive lookbehind*); the latter matches a space followed by three digits, with the digits saved to a capture group. – Cary Swoveland Nov 29 '21 at 21:58

1 Answers1

1

I'm not sure you need a regex here:

with open("access.log") as file:
    for line in file:
        print(line.split()[8])

# Output:
200
200
200
200
200
200
200
200
200
200
200
200
Corralien
  • 109,409
  • 8
  • 28
  • 52