0

I have the below text file that I would need some help with parsing out IP addresses.

The text file is of the form

abc 10.1.1.1/32   aabbcc
def 11.2.0.0/16   eeffgg
efg 0.0.0.0/0   ddeeff

In other words, a bunch of IP networks exist as part of a log file. The output should be provided as below:

10.1.1.1/32
11.2.0.0/16
0.0.0.0/0

I have the below code but does not output the required information

file = open(filename, 'r')
for eachline in file.readlines():
    ip_regex = re.findall(r'(?:\d{1,3}\.){3}\d{1,3}', eachline)
    print ip_regex
lordlabakdas
  • 1,163
  • 5
  • 18
  • 33
  • Try to describe what does each line of code and you will find the error. see re documentation too. – Casimir et Hippolyte Oct 14 '14 at 21:08
  • Well, you didn't include anything in your regex to match the `/32` or similar at the end, so of course it's only going to match the `10.1.1.1` or similar. – abarnert Oct 14 '14 at 21:10
  • `re.findall("\d+\.\d+\.\d+\.\d+\/\d+",file.read())`, you should also use `with` to open your files – Padraic Cunningham Oct 14 '14 at 21:15
  • As a side note, there is no reason to use `readlines()` there. `file` is already an iterable of lines. All you're doing is wastefully forcing Python to read and parse the entire file in memory before you can use it. – abarnert Oct 14 '14 at 21:16
  • As another side note, those aren't IP addresses, those are IP _networks_, which contain an address and a bitmask. In fact, your existing code is _already_ finding the IP addresses that are part of those networks… – abarnert Oct 14 '14 at 21:20
  • @abarnert you are right...a bit sloppy with the IP terminology in the question...should be IP networks – lordlabakdas Oct 14 '14 at 21:36

2 Answers2

6

First, your regex doesn't even attempt to capture anything but four dotted numbers, so of course it's not going to match anything else, like a /32 on the end. if you just add, e.g., /\d{1,2} to the end, it'll fix that:

(?:\d{1,3}\.){3}\d{1,3}/\d{1,2}

Regular expression visualization

Debuggex Demo


However, if you don't understand regular expressions well enough to understand that, you probably shouldn't be using a regex as a piece of "magic" that you'll never be able to debug or extend. It's a bit more verbose with str methods like split or find, but maybe easier to understand for a novice:

for line in file:
    for part in line.split()
        try:
            address, network = part.split('/')
            a, b, c, d = address.split('.')
        except ValueError:
            pass # not in the right format
        else:
            # do something with part, or address and network, or whatever

As a side note, depending on what you're actually doing with these things, you might want to use the ipaddress module (or the backport on PyPI for 2.6-3.2) rather than string parsing:

>>> import ipaddress
>>> s = '10.1.1.1/32'
>>> a = ipaddress.ip_network('10.1.1.1/32')

You can combine that with either of the above:

for line in file:
    for part in line.split():
        try:
            a = ipaddress.ip_network(part)
        except ValueError:
            pass # not the right format
        else:
            # do something with a and its nifty methods
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • this website Debuggex that @abernert linked to, is the best website for regex i have ever seen. – TehTris Oct 14 '14 at 21:19
  • @TehTris: Yeah, I do love it. But notice that once they're out of beta, they're apparently going to start charging for non-JS regexes. They already started charging for the convert-to-plain-English feature (which they then disabled…). Very clever; I'm not sure I could go back to… whatever I used to use, which I can't even remember anymore. :) – abarnert Oct 14 '14 at 21:22
  • ipaddress does not work for adresses like this "010.200.074.104". To parse this, it is better to use a one-liner like this: ".".join([str(int(x)) for x in ipv4_str.split(".")]) – J_Zar Oct 19 '21 at 09:33
1

In this particular case, a regex might be overkill, you could use split

with open(filename) as f:
    ipList = [line.split()[1] for line in f]

This should produce a list of strings, which are the ip addresses.

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218