-1

I'm accessing a log file that has lines as:

May  1 07:39:30 example-server sshd[61362]: reverse mapping checking getaddrinfo for 37-115-223-100.broadband.kyivstar.net [37.115.223.100] failed - POSSIBLE BREAK-IN ATTEMPT!

May  1 07:42:02 example-server sshd[61698]: reverse mapping checking getaddrinfo for 234.10.13.218.broad.fs.gd.dynamic.163data.com.cn [218.13.10.234] failed - POSSIBLE BREAK-IN ATTEMPT!

I want to parse the file and extract the IP address inside the square brackets that are after the phrase ("reverse mapping checking...." and before the word "failed")

I'm new to regular expressions and can't figure out the step.

Also, Ip address each octet being <100 or >100 is confusing as I can't use [0-9[0-9] (fixed stuff)

Please help me extract that IP address using any method.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241

4 Answers4

2

This regex should work :

r'reverse mapping checking getaddrinfo for \S+ \[([^\]]+)\]'

\S means no whitespace characters, I'm using wit the + quantifier it to get the whole url with the IP, then to capture the group inside the square brackets, I'm using this group:

([\]]]+). It can only capture as much non ] characters as possible, so it will capture the whole IP address.

Axnyff
  • 9,213
  • 4
  • 33
  • 37
0

I woudl do it following way:

import re
text = '''May  1 07:39:30 example-server sshd[61362]: reverse mapping checking getaddrinfo for 37-115-223-100.broadband.kyivstar.net [37.115.223.100] failed - POSSIBLE BREAK-IN ATTEMPT!

May  1 07:42:02 example-server sshd[61698]: reverse mapping checking getaddrinfo for 234.10.13.218.broad.fs.gd.dynamic.163data.com.cn [218.13.10.234] failed - POSSIBLE BREAK-IN ATTEMPT!'''
ips = re.findall(r'(?<=\[)\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?=\])',text)
print(ips) # ['37.115.223.100', '218.13.10.234']

Note that I used so-called r-string, so I could use single \ as escape without need of escaping it. My pattern consist of 3 main parts:

  • (?<=\[) is zero-length assertion, meaning: check if there is [ before match, [ needs to be escaped as it has special meaning
  • \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} is four numbers consisting of 1 to 3 digits (\d) sheared by dots (. - again escape is needed, as . has special meaning)
  • (?=\]) is zero length assertion, meaning: check if there is ] after match, ] needs to be escaped too.
Daweo
  • 31,313
  • 3
  • 12
  • 25
0
import re

regex = r"\[[0-9.]+\] failed"

test_str = ("May  1 07:39:30 example-server sshd[61362]: reverse mapping checking getaddrinfo for 37-115-223-100.broadband.kyivstar.net [37.115.223.100] failed - POSSIBLE BREAK-IN ATTEMPT!\n\n"
    "May  1 07:42:02 example-server sshd[61698]: reverse mapping checking getaddrinfo for 234.10.13.218.broad.fs.gd.dynamic.163data.com.cn [218.13.10.234] failed - POSSIBLE BREAK-IN ATTEMPT!")

matches = re.finditer(regex, test_str, re.MULTILINE)
mapping = [ (' failed', ''), ('[', ''), (']', '') ]


for matchNum, match in enumerate(matches, start=1):
    my_string = match.group()
    for k, v in mapping:
        my_string = my_string.replace(k, v)    
    print ("IP : {match}".format(match = my_string))
Vega
  • 27,856
  • 27
  • 95
  • 103
STAR-SSS
  • 30
  • 3
0

I am personally against regexes in such simple cases, python has brilliant method string.split() which can do work faster and simpler. Why just don't

def get_ip(logstr):
  return logstr.split('reverse mapping checking', 1)[1].split('[', 1)[1].split(']', 1)[0]

with open(logfile) as f:
  for line in f:
    if 'reverse mapping checking' in line:
      print(get_ip(line))

It's simple - logstr.split('reverse mapping checking', 1) gives you two strings - before 'reverse mapping checking' with index 0 and after - with index 1. I set splitting count to 1 to advice python don't search for string again. Then we take second of string by [1], and split it again by '[', getting things after [ with [1], then split by ']' getting [0] this time as ip is before ]. That's all

Sav
  • 616
  • 3
  • 9