1

Problem

I'm quiet new to python and I'm tring to (using python) go through a number of large custom log files to extract parameters from certain GET-requests an try to gain som statistics from them.

The log files I'm parsing look like this:

80 172.23.131.149 "2018-07-05 13:08:25 860" "POST /bios/servlet/bios.servlets.sso.WaffleLoginServlet HTTP/1.1" 401 5 891 891 "-" "Java/1.8.0_171"
8080 172.23.131.251 "2018-07-05 13:08:26 594" "HEAD /bios/servlet/bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 1953 1953 "-" "-"
8080 172.23.131.252 "2018-07-05 13:08:26 594" "HEAD /bios/servlet/bios.servlets.web.Ping?level=3 HTTP/1.0" 200 - 953 953 "-" "-"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576777.34375%2C156269.53125%2C6576806.640625 HTTP/1.1" 200 133210 3547 3516 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156240.234375%2C6576748.046875%2C156269.53125%2C6576777.34375 HTTP/1.1" 200 108066 3547 3532 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 188" "POST /bios/servlet/bios.servlets.GetGeometryComponents HTTP/1.1" 401 4 2484 2484 "-" "Java/1.8.0_171"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576806.640625%2C156240.234375%2C6576835.9375 HTTP/1.1" 200 123953 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156210.9375%2C6576777.34375%2C156240.234375%2C6576806.640625 HTTP/1.1" 200 147132 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
80 172.23.131.149 "2018-07-05 13:08:28 204" "GET /bios/wms/app/baggis/web/WMS_STHLM_STOCKHOLMSKARTA_HYBRID_INTERN?TILED=TRUE&SERVICE=WMS&VERSION=1.1.1&REQUEST=GetMap&FORMAT=image%2Fpng&TRANSPARENT=false&LAYERS=p_1002095&SRS=EPSG%3A3011&STYLES=&r=n2q&WIDTH=256&HEIGHT=256&BBOX=156269.53125%2C6576777.34375%2C156298.828125%2C6576806.640625 HTTP/1.1" 200 145701 3563 3547 "http://tkkarta3.stockholm.se/astolmap/v3/kopplet/tkkarta.htm" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"

What I try to do?

  1. extract all the rows that include a certain Word in the request "GetMap" (that means that these rows show a Web Map Server request and I'm just intrested in those)
  2. From those lines extract the parameter in the request that follows "LAYER=" or "layer=" and ends with an ampersand (&) and name it with key "lager" with a regex (should return for example 'p_1002095')
  3. sum up the number of occurrences of the key "lager"

I have trouble getting number 1 above to work. I could not find anything helpful (probably not looking for the right thing). The problem seem to be that the Word "GetMap" i located within a longer string. But that somehow sounds like an easy task but I can' figure out how to do it.

The code I'm using right now for doing number 2 and 3 in my task list above is :

#!/usr/bin/env python3

import os
import re
from collections import Counter

# regular expression
rexp = r"(^.+[LAYERSlayers]=(?P<domain>.*?)&)" # sök efter LAYERS= eller layer=
# create counter dictionary
cnt_domains = Counter()

path = '/home/uwestephan/Logg-file-parsing/ws00524'

matched = 0
failed = 0
for filename in os.listdir(path):
    filmedsokvag = (path+"/"+filename)
    print (filmedsokvag)

    # read file / gather data
    f = open(filmedsokvag, 'r')
    for line in f:
        m = re.match(rexp, line)
        if m:
            cnt_domains.update([m.group('domain')])
            matched += 1
        else:
            failed += 1

# Output Results
print('[*] %d lines matched the regular expression' % (matched))
print('[*] %d lines failed to match the regular expression' % (failed), end='\n\n')
print('[*] ============================================')
print('[*] 100 Most Frequently Occurring of Lager Queried')
print('[*] ============================================')
for domain, count in cnt_domains.most_common(100):
    print('[*] %30s: %d' % (domain, count))
print('[*] ============================================')

# Output results to file
with open('parseroutput.txt', 'w') as fd:
    print('[*] %d lines matched the regular expression' % (matched), file=fd)
    print('[*] %d lines failed to match the regular expression' % (failed), end='\n\n', file=fd)
    print('[*] ============================================', file=fd)
    print('[*] 100 Most Frequently Occurring Lager Queried', file=fd)
    print('[*] ============================================', file=fd)
    for domain, count in cnt_domains.most_common(100):
      print('[*] %30s: %d' % (domain, count), file=fd)
    print('[*] ============================================', file=fd)

Do you have any sugestions how to extract the GetMap-requests? Thank you in advance!

AcK
  • 2,063
  • 2
  • 20
  • 27
  • Possible duplicate of [Does Python have a string 'contains' substring method?](https://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-substring-method) – Mike Scotty Jul 18 '18 at 09:46

1 Answers1

0

check if line contains 'GetMap' and skip line if does not.

for line in f:
    if 'GetMap' in line:  # check for 'GetMap'
        m = re.match(rexp, line)
        if m:
            cnt_domains.update([m.group('domain')])
            matched += 1
        else:
            failed += 1
AcK
  • 2,063
  • 2
  • 20
  • 27