2

How do I make a shell command that finds IPv6 addresses in its stdin?

One option is to use:

grep -Po '(?<![[:alnum:]]|[[:alnum:]]:)(?:(?:[a-f0-9]{1,4}:){7}[a-f0-9]{1,4}|(?:[a-f0-9]{1,4}:){1,6}:(?:[a-f0-9]{1,4}:){0,5}[a-f0-9]{1,4})(?![[:alnum:]]:?)'

This RE is based on ideas from "Regular expression that matches valid IPv6 addresses", but this is not quite accurate. I could use an even uglier regular expression, but is there a better way, some command that I don't know about?

Community
  • 1
  • 1
JanKanis
  • 6,346
  • 5
  • 38
  • 42

1 Answers1

2

Since I couldn't find an easy way using shell script commands, I created my own in Python:

#!/usr/bin/env python

# print all occurences of well formed IPv6 addresses in stdin to stdout. The IPv6 addresses should not overlap or be adjacent to eachother. 

import sys
import re

# lookbehinds/aheads to prevent matching e.g. 2a00:cd8:d47b:bcdf:f180:132b:8c49:a382:bcdf:f180
regex = re.compile(r'''
            (?<![a-z0-9])(?<![a-z0-9]:)
            ([a-f0-9]{0,4}::?)([a-f0-9]{1,4}(::?[a-f0-9]{1,4}){0,6})?
            (?!:?[a-z0-9])''', 
        re.I | re.X)

for l in sys.stdin:
    for match in regex.finditer(l):
        match = match.group(0)
        colons = match.count(':')
        dcolons = match.count('::')
        if dcolons == 0 and colons == 7:
            print match
        elif dcolons == 1 and colons <= 7:
            print match
JanKanis
  • 6,346
  • 5
  • 38
  • 42
  • Do you really need look-aheads and look-behinds? – Anthony Mar 07 '16 at 22:49
  • It seems like the syntax is basically : "8 groups of 16-bit numbers, separated by colons (:) when using hex of number, or dot (.) if decimal of number is used, with concurrent groups with value of 0 potentially collapsing to one double colon (::)". I say basically like it's just so easy, but obviously it gets gnarly pretty fast. But the only real gotcha I'm noticing is the legal mixing of hex and decimal. – Anthony Mar 07 '16 at 23:08
  • The lookahead/behind is to prevent matching more than 8 groups, which is obviously not a well formed ip6 address. Your summary is correct I think. The hex groups can be up to four characters. This answer does not find addresses that use the decimal notation. – JanKanis Mar 08 '16 at 08:02