-1

I'm trying to filter every IP-adress found in an access.log (which is read-in and converted to a string) and then count their occurences. I can do this but the format of the IP-adresses in the list is weird. One element of the list is "('110', '78', '168', '85')" instead of "('110.78.168.85')". How do I make it look like an IP-adress?

I've tried to read other answers on Stackoverflow but none of them seemed to solve my problem.

import re


f = open("/var/log/apache2/access.log", "r")
f_as_string = f.read()
f.close()

x = re.findall(r'(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)', f_as_string)

# ...
('110.78.168.85') 

instead of

('110', '78', '168', '85')
DirtyBit
  • 16,613
  • 4
  • 34
  • 55
topkek
  • 149
  • 9
  • You dont really need a regex for this, just `split(".")`? – DirtyBit Apr 08 '19 at 14:40
  • Also, avoid file handling with that explicit approach of openeing and closing the file. – DirtyBit Apr 08 '19 at 14:40
  • 1
    Provide a sample data from `access.log`? – DirtyBit Apr 08 '19 at 14:41
  • Sample file: 111.222.333.123 HOME - [01/Feb/1998:01:08:39 -0800] "GET /bannerad/ad.htm HTTP/1.0" 200 198 "http://www.referrer.com/bannerad/ba_intro.htm" "Mozilla/4.01 (Macintosh; I; PPC)" 111.222.333.123 HOME - [01/Feb/1998:01:08:46 -0800] "GET /bannerad/ad.htm HTTP/1.0" 200 28083 "http://www.referrer.com/bannerad/ba_intro.htm" "Mozilla/4.01 (Macintosh; I; PPC)" – topkek Apr 08 '19 at 16:12

1 Answers1

1

findall will return a list of groups if your pattern has any capturing groups. Your pattern has four pairs of parentheses, so findall returns a list of groups of four-element tuples.

Try writing your pattern using non-capturing parentheses.

>>> import re
>>> f_as_string = "foobar 110.78.168.85 bazqux 123.45.067.89"
>>> re.findall(r'(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)', f_as_string)
['110.78.168.85', '123.45.067.89']

Alternatively, keep your regex pattern the way it was, and use finditer to extract only the complete groups from the match objects.

>>> import re
>>> f_as_string = "foobar 110.78.168.85 bazqux 123.45.067.89"
>>> [m.group() for m in re.finditer(r'(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)', f_as_string)]
['110.78.168.85', '123.45.067.89']
Kevin
  • 74,910
  • 12
  • 133
  • 166