4

Given the following string (or similar strings, some of which may contain more than one IP address):

from mail2.oknotify2.com (mail2.oknotify2.com. [208.83.243.70]) by mx.google.com with ESMTP id dp5si2596299pdb.170.2015.06.03.14.12.03

I wish to extract the first and only the first IP address, in Python. A first attempt with something like ([0-9]{2,}\.){3}([0-9]{2,}){1} when tried out on nregex.com, looks almost OK, matching the IP address fine, but also matches the other substring which roughly resembles an IP address (170.2015.06.03.14.12.03). When the same pattern is passed to re.compile/re.findall though, the result is:

[(u'243.', u'70'), (u'06.', u'03')]

So clearly the regex is no good. How can I improve it so that it's neater and catches all IPV4 address, and how can I make it such that it only matches the first?

Many thanks.

Pyderman
  • 14,809
  • 13
  • 61
  • 106
  • 2
    Will the IP addresses always be within square brackets? – Mr. Bultitude Jun 04 '15 at 21:15
  • @Mr.Bultitude yes for the purposes of this exercise I'm only checking "Received: from" headers, and from what I can tell, for these,the IP address is always contained in []. – Pyderman Jun 04 '15 at 21:47

2 Answers2

11

Use re.search with the following pattern:

>>> s = 'from mail2.oknotify2.com (mail2.oknotify2.com. [208.83.243.70]) by mx.google.com with ESMTP id dp5si2596299pdb.170.2015.06.03.14.12.03'
>>> import re
>>> re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', s).group()
'208.83.243.70'
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • 1
    It may be prudent to ensure that the IP within the brackets are captured explicitly if that's the OPs desire: re.search(r'\\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\\]', s).group(1) – stevieb Jun 04 '15 at 21:42
1

The regex you want is r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'. This catches 4 1- to 4-digit numbers separated by dots.

If the IP number always comes before other numbers in the string, you can avoid selecting it by using a non-greedy function such as re.find. In contrast, re.findall will catch both 208.83.243.70 and 015.06.03.14.

Are you OK with using the brackets to single out the IP number? if so, you can change the regex to r'\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]'. It would be safer that way.

Paulo Mendes
  • 697
  • 5
  • 16