2

I'm writing an optimization where you are performing a search for my application and if the string looks like an ip address, then don't bother searching MAC addresses. And if the search looks like a MAC address, don't bother looking in the IP address db column.

I have seen expressions that match ips and mac addresses exactly, but its hard to come by one that matches partial strings and quite a fun brain teaser and I thought I'd get other people's opinions. Right now I have a solution without regex.

use List::Util qw(first);

sub query_is_a_possible_mac_address {
  my ($class, $possible_mac) = @_;
  return 1 unless $possible_mac;

  my @octets = split /:/, $possible_mac, -1;
  return 0 if scalar @octets > 6; # fail long MACS
  return 0 if (first { $_ !~ m/[^[:xdigit:]]$/ } @octets; # fail any non-hex characters
  return not first { hex  $_ > 2 ** 8 }; # fail if the number is too big
}

# valid tests
'12:34:56:78:90:12'
'88:11:'
'88:88:F0:0A:2B:BF'
'88'
':81'
':'
'12:34'
'12:34:'
'a'
''

# invalid tests
'88:88:F0:0A:2B:BF:00'
'88z'
'8888F00A2BBF00'
':81a'
'881'
' 88:1B'
'Z'
'z'
'a12:34'
' '
'::88:'
Sicco
  • 6,167
  • 5
  • 45
  • 61
Joe Heyming
  • 777
  • 6
  • 11
  • possible duplicate of [What is a regular expression for a MAC Address?](http://stackoverflow.com/questions/4260467/what-is-a-regular-expression-for-a-mac-address) – eggyal Jun 04 '12 at 23:54
  • Nope, I want a regular expression for something that contains a MAC address, not 'is' a MAC address. – Joe Heyming Jun 05 '12 at 00:01
  • One other solution for MAC addresses is to pad the input with 00s. For example, you take input 11 and pad it to be 11:00:00:00:00:00 and then put that through the MAC address RegEx. You can't do the same for IPs or even IPv6 addresses – Joe Heyming Jun 05 '12 at 00:02
  • 2
    If you have a regular expression for something that is a MAC address, and you apply it to a text string that contains a MAC address, it will match the MAC address and give it back to you. The only change you might have to make is removing any `^` or `$` characters that were being used to specify that the MAC address had to be at the beginning/end of the string. – octern Jun 05 '12 at 00:19
  • Here are my ip test cases: valid '10.46.220.215', '10', '.46', '.', '127.0', '127.0.', '1', '', '255/24', '/24', 'dead::beef', 'dead::', '::beef', 'dead', 'dead:beaf', '0000:0000:0000:0000:0000:0000:0000:0000', Invalid '255.255.255.255.255', '1234.1.1.1', '127001', '.127a', ' 127.0', 'Z', 'a127.0', ' ', '加油.127', 'dead::beefz', '0000:0000:0000:0000:0000:0000:0000:0000:1234', '::::::' – Joe Heyming Jun 05 '12 at 00:53
  • 1
    So remove the "^" and "$" from the regex in the linked post. – ikegami Jun 05 '12 at 01:10
  • Do you also want a regex to match partial IP addresses? – Sicco Jun 05 '12 at 10:40

2 Answers2

1

Given the (new) tests, this works:

/^[0-9A-Fa-f]{0,2}(:[0-9A-Fa-f]{2}){0,5}:?$/

Here are the lines that match given the above tests (note that single hex characters like 'a' and 'A' are correctly matched:

12:34:56:78:90:12
88:11:
88:88:F0:0A:2B:BF
88
:81
:
12:34
12:34:
a
'' (<-- empty space)
Sicco
  • 6,167
  • 5
  • 45
  • 61
  • 'A' =~ /^[0-9A-Fa-f]{0,2}(:[0-9A-Fa-f]{2}){0,5}:?$/ yeilds me undef – Joe Heyming Jun 05 '12 at 16:23
  • I don't think the first part of your Regex is correct. It assumes all the input has 2 hex characters in the front. The only time it works for me is when I get greater then 3 characters: Example: 'AA:'. You are missing the corner cases – Joe Heyming Jun 05 '12 at 16:35
  • Hmm weird; The first part says that it should match 0, 1 or 2 hex characters, so 'A' should match. On my system it gives me the right output. I updated my answer showing my output. – Sicco Jun 06 '12 at 08:17
0

The best way I found to do this was to try and make the possible match become what you are trying to match. For example if you have a string: 1.2, try and make it look like an ip address: 1.2.1.1. Then apply the regex

sub contains_ip {
    my ($possible_ip) = @_;

    my @splits = split /\./, $possible_ip;

    return 0 if @splits > 4;
    while (@splits < 4) {
        push @splits, '1';
    }

    $possible_ip = join '.', @splits;

    my ($match) = $possible_ip =~ m/^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$/;
    return defined $match ? 1 : 0;
}

warn contains_ip('1.2'); # 1
warn contains_ip('127.0.0.1'); # 1
warn contains_ip('1.2asd'); # 0
warn contains_ip('1.2.3.4.5'); # 0

The same thing applies to mac addresses: If you had 11:22, try and make it look like a fully qualified mac address, 11:22:00:00:00:00, then apply the mac address regex to it.

Joe Heyming
  • 777
  • 6
  • 11