0

I am trying to display both the datetime and the IP address of my log file:

Apr 20 07:03:53 123.345.45.123
^             ^ ^            ^
|---datetime--| |-----IP-----|   

my code:

datetimeRegex = re.compile(r'^\w{3}\s\d\d\s\d\d:\d\d:\d\d')

IPRegex = re.compile(r'\d+.\d+.\d+.\d{1,3}')

f = open("logfile.log","r")

count = 0

for line in f.readlines():
    datetime = re.match(datetimeRegex, line)
    IPaddr = re.match(IPRegex, line)
    if datetime and IPaddr:
        count += 1
        print str(count) + ":" + str(datetime.group()) + "IP: " + str(IPaddr.group())

I tried to see what is not matching and I think it is IPaddr that isn't matching because I removed IPaddr from my if statment and my output would print the dates. It is when I added IPaddr that nothing would print. So I think I am not matching my IP address correctly. However I tried a sample IP and my regex on an online regex tester and it seemed to work. Is there something missing in my REGEX? Or perhaps there is something wrong with my logic? If there is a faster or more efficient way to parse through the log file, I am open to suggestions.

Jerry
  • 70,495
  • 13
  • 100
  • 144
Liondancer
  • 15,721
  • 51
  • 149
  • 255
  • Are you sure all the date time strings match your date/time RE? Your example does, but do all of them look like that? Suppose you had `Apr 1, 20:07:30` instead of `Apr 01, 20:07:30`. In that case, no match. – lurker Jan 16 '14 at 18:31
  • @mbratch yes i am sure because I tested only the datetime at first and all the datetimes were displaying. It was when I included the IPRegex is when I was getting None as output – Liondancer Jan 16 '14 at 18:33

2 Answers2

3

replace all usages of . with \.

a single period is a special character in a regex that means "any character." If you want a literal period, you need to use the \ character to escape it.

IPRegex = re.compile(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")
ip = "192.168.1.1"

matches = IPRRegex.match(ip)
[OUT] <_sre.SRE_Match object at 0x0000000003349578>
Adam Smith
  • 52,157
  • 12
  • 73
  • 112
  • I have this `re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')` but there is no output =/ – Liondancer Jan 16 '14 at 18:20
  • 1
    Probably because you're using [`re.match`](http://docs.python.org/2/library/re.html#re.match) which only matches at the beginning of a line. Use `re.search` instead, and also see [`re.search() vs re.match()`](http://docs.python.org/2/library/re.html#search-vs-match) – Adam Smith Jan 16 '14 at 18:32
1

You should use re.search instead of re.match because re.match matches exclusively at the start of the line while re.search will find a match anywhere in a string.

It would be also better if you tweak your regex a little bit (escape the ., they are wildcards in regex and matches everything but newlines, unnecessary anchor for the datetimeRegex since you're using this one with re.match, putting \d\d? to match dates such as Jan 1 12:34:56 and the IP regex to accept a bit more valid IPs)

datetimeRegex = re.compile(r'\w{3}\s\d\d?\s\d\d:\d\d:\d\d')

IPRegex = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
       # You can also use re.compile(r'(?:\d{1,3}\.){3}\d{1,3}')

f = open("logfile.log","r")

count = 0

for line in f.readlines():
    resdatetime = re.match(datetimeRegex, line)  # And avoid using built-in names such as
                                                 # 'datetime'
    IPaddr = re.search(IPRegex, line)            # Here, use re.search
    if resdatetime and IPaddr:
        count += 1
        print str(count) + ":" + str(datetime.group()) + "IP: " + str(IPaddr.group())
Jerry
  • 70,495
  • 13
  • 100
  • 144
  • I thought about perhaps IP and datetime isnt on the same line but in my log file the IP address is literally a space away from the datetime and I always though a line is a pretty long set of characters – Liondancer Jan 16 '14 at 18:35
  • As I pointed out in a comment in my answer -- it's because he's using re.match not re.search. re.match will ONLY EVER match if the beginning of the string matches the regex. – Adam Smith Jan 16 '14 at 18:37
  • @Jerry sorry I dont know many built in python functions so I guess my question wasnt as clear as it could be – Liondancer Jan 16 '14 at 18:40
  • You're right, and his sample input was definitely a problem. I admit I actually lucked into the right answer -- I scanned the question quickly and saw he hadn't escaped his `.` character, and answered immediately. – Adam Smith Jan 16 '14 at 18:40
  • 1
    @Liondancer Jerry's confusion came from your sample input being on two lines, while in your program you expect them to be on one line. Since you're iterating through `f.readlines()`, there would NEVER be an instance where both `datetime` and `IPAddr` would have matches if they were on separate lines. – Adam Smith Jan 16 '14 at 18:42
  • Also, please don't name that variable `datetime`, as it's the name of the python stdlib module `datetime` (do `import datetime; datetime.datetime.now()`) – Adam Smith Jan 16 '14 at 18:42