6

I'd like to have a function that I can pass a whitespace trimmed string to and it will return
0 for error (not a string) 1 for ipv4 2 for ipv6 3 for a string thats not an ip.

Ipv6 has these rules:

Ipv6 is represented by 8 groups of 16-bit hexadecimal values separated by colons (:)
The hexadecimal digits are case-insensitive
Abbreviation rules:
1: Omit leading zeroes in a 16-bit value
2: Replace one or more groups of consecutive zeroes by a double colon

wiki example showing 3 ways that are all the same ipv6:

fe80:0000:0000:0000:0202:b3ff:fe1e:8329
fe80:0:0:0:202:b3ff:fe1e:8329
fe80::202:b3ff:fe1e:8329 

I'm reasonably sure for ipv4 you just check for three . then check the string is all
numbers and the .'s are counted as numbers and the last check for just a string
would be at the end of an if statement so if its not ipv4/6 and its a string then
it returns 3

logi-kal
  • 7,107
  • 6
  • 31
  • 43
Col_Blimp
  • 779
  • 2
  • 8
  • 26

5 Answers5

9

Mike's solution is good, but it can be improved on in several ways. In its current form it doesn't get to ipv6 address check, but it's easy to fix. The ipv6 check fails on things like "1050!0!0+0-5@600$300c#326b" and "1050:0:0:0:5:600:300c:326babcdef" (recognizing both as valid addresses) and "1050:::600:5:1000::" (recognizing it as string).

Here is the improved version (IPv4 are assumed to be decimal numbers and IPv6 are assumed to be hexadecimal numbers):

function GetIPType(ip)
  local R = {ERROR = 0, IPV4 = 1, IPV6 = 2, STRING = 3}
  if type(ip) ~= "string" then return R.ERROR end

  -- check for format 1.11.111.111 for ipv4
  local chunks = {ip:match("^(%d+)%.(%d+)%.(%d+)%.(%d+)$")}
  if #chunks == 4 then
    for _,v in pairs(chunks) do
      if tonumber(v) > 255 then return R.STRING end
    end
    return R.IPV4
  end

  -- check for ipv6 format, should be 8 'chunks' of numbers/letters
  -- without leading/trailing chars
  -- or fewer than 8 chunks, but with only one `::` group
  local chunks = {ip:match("^"..(("([a-fA-F0-9]*):"):rep(8):gsub(":$","$")))}
  if #chunks == 8
  or #chunks < 8 and ip:match('::') and not ip:gsub("::","",1):match('::') then
    for _,v in pairs(chunks) do
      if #v > 0 and tonumber(v, 16) > 65535 then return R.STRING end
    end
    return R.IPV6
  end

  return R.STRING
end

The script to check:

local IPType = {[0] = "Error", "IPv4", "IPv6", "string"}
local ips = {
    "128.1.0.1", -- ipv4
    "223.255.254.254", -- ipv4
    "999.12345.0.0001", -- invalid ipv4
    "1050:0:0:0:5:600:300c:326b", -- ipv6
    "1050!0!0+0-5@600$300c#326b", -- string
    "1050:0:0:0:5:600:300c:326babcdef", -- string
    "1050:0000:0000:0000:0005:0600:300c:326b", -- ipv6
    "fe80:0000:0000:0000:0202:b3ff:fe1e:8329", -- ipv6
    "fe80:0:0:0:202:b3ff:fe1e:8329", -- ipv6
    "fe80::202:b3ff:fe1e:8329", -- ipv6
    "1050:::600:5:1000::", -- contracted ipv6
    "::", -- ipv6
    "::1", -- ipv6
    "::1::", -- string
    "129.garbage.9.1", -- string
    "xxx127.0.0.0", -- error
    "xxx1050:0000:0000:0000:0005:0600:300c:326b", -- string
    129.10 -- error
}
for k,v in pairs(ips) do
    print(v, IPType[GetIPType(v)])
end

And the output:

128.1.0.1   IPv4
223.255.254.254 IPv4
999.12345.0.0001    string
1050:0:0:0:5:600:300c:326b  IPv6
1050!0!0+0-5@600$300c#326b  string
1050:0:0:0:5:600:300c:326babcdef    string
1050:0000:0000:0000:0005:0600:300c:326b IPv6
fe80:0000:0000:0000:0202:b3ff:fe1e:8329 IPv6
fe80:0:0:0:202:b3ff:fe1e:8329   IPv6
fe80::202:b3ff:fe1e:8329    IPv6
1050:::600:5:1000:: IPv6
::  IPv6
::1 IPv6
::1::   string
129.garbage.9.1 string
xxx127.0.0.0    string
xxx1050:0000:0000:0000:0005:0600:300c:326b  string
129.1   Error

Updated on 9/6/2018 to add handling of garbage before/after addresses and checking for contracted ipv6, which allows for fewer than 8 groups with one empty group of two consecutive colons.

Paul Kulchenko
  • 25,884
  • 3
  • 38
  • 56
  • 2
    Thanks! I'll try to use this for a template on Swedish-language Wikipedia. This is exactly what I was looking for! – Emil Vikström May 20 '13 at 09:30
  • Perfect! It also matches localhost ::1 and many others. Now it is also being used on Portuguese Wikipedia. :) – Diego Queiroz Oct 05 '14 at 03:31
  • 1
    Neither this or the solution marked as accepted answer handle well addresses prefixed/suffixed with random garbage (for example `xxx127.0.0.0` is considered as valid). Imho `^$` symbols should be added to the pattern. – JeFf Apr 25 '17 at 13:05
  • Updated to address @JeFf's comment and to add support for one empty group in ipv6. – Paul Kulchenko Sep 07 '18 at 06:03
4

this seems like a pretty basic problem to solve. i think this function does what you need...

function GetIPType(ip)
    -- must pass in a string value
    if ip == nil or type(ip) ~= "string" then
        return 0
    end

    -- check for format 1.11.111.111 for ipv4
    local chunks = {ip:match("(%d+)%.(%d+)%.(%d+)%.(%d+)")}
    if (#chunks == 4) then
        for _,v in pairs(chunks) do
            if (tonumber(v) < 0 or tonumber(v) > 255) then
                return 0
            end
        end
        return 1
    else
        return 0
    end

    -- check for ipv6 format, should be 8 'chunks' of numbers/letters
    local _, chunks = ip:gsub("[%a%d]+%:?", "")
    if chunks == 8 then
        return 2
    end

    -- if we get here, assume we've been given a random string
    return 3
end

tested it with this code:

local IPType = {
    [0] = "Error",
    [1] = "IPv4",
    [2] = "IPv6",
    [3] = "string",
}


local ips = {
    "128.1.0.1",        -- ipv4
    "223.255.254.254",  -- ipv4
    "999.12345.0.0001",     -- invalid ipv4
    "1050:0:0:0:5:600:300c:326b",               -- ipv6
    "1050:0000:0000:0000:0005:0600:300c:326b",  -- ipv6
    "1050:::600:5:1000::",  -- contracted ipv6
    "129.garbage.9.1",  -- string
    129.10              -- error
}

for k,v in pairs(ips) do
    print(v, IPType[GetIPType(v)])
end

which generated this output:

128.1.0.1   IPv4
223.255.254.254 IPv4
1050:0:0:0:5:600:300c:326b  IPv6
1050:0000:0000:0000:0005:0600:300c:326b IPv6
129.garbage.9.1 string
129.1   Error

in the future, you'll get more helpful feedback if you actually post the code you've attempted to write to solve your particular problem, and let us know where you need help. SO isn't a personal code writing service, as stated in the faq. however, i'll give you the benefit of the doubt since you look new and this is something that could potentially benefit other people. the code above is basic, so feel free to update it if it doesn't catch fringe test cases i don't know about.

Mike Corcoran
  • 14,072
  • 4
  • 37
  • 49
  • I like this approach but it says that 999.12345.0.0001 is IPv4 and it doesn't identify contracted IPv6 addresses :-/ – Emil Vikström May 15 '13 at 10:04
  • give me a little bit and i'll try to get the answer updated to handle the contracted IPv6 formats, and add validity checks for each ip chunk in the IPv4 address format. – Mike Corcoran May 15 '13 at 14:01
  • hmm, checking all variations of IPv6 addresses is not very trivial. if someone else has time and wants to tackle it - feel free. also, the check for IPv4 addresses would need to be updated to handle binary based ips (format like 1111000.00001111.01010101.111111) – Mike Corcoran May 15 '13 at 16:27
  • Mike, at least for me decimal notation is enough for IPv4 and hexadecimal for IPv6. Otherwise we could start checking ALL bases and that would just be ridiculous. – Emil Vikström May 16 '13 at 05:05
  • @EmilVikström Ipv6 should definitely include the partial decimal notation used for Ipv4 addresses. – Cubic May 20 '13 at 06:20
  • -1 because it does not match several IPv6, like the localhost ::1 – Diego Queiroz Oct 05 '14 at 03:25
  • 1
    has anyone tested this with IPV4 addresses like "10.1.2.3.4.5"? Seems to pass when it should fail – Happydevdays Nov 21 '16 at 16:37
  • string.match returns the first match so it'll match 10.1.2.3.4.5, you need to add anchors to the start and end of the string: local chunks = {ip:match("^(%d+)%.(%d+)%.(%d+)%.(%d+)$")} – steveayre Jun 14 '18 at 16:57
2

This seems as something that could be easily done by using regular expressions. There is plenty of regex libraries for lua.

If, however, you are not willing or are unable to use them, I would do something like this:

Start in ipv4 state
Take a character until string ends
    switch(state)
    ipv4:
        if it's a dot, check if we loaded at least one number
        if it's a number, check if it isn't the 4th in row
        if it's anything else, set state to ipv6 and proceed in this state
    ipv6:
        if it's a ':', check if we didnt exceed maximum number of segments
        if it's a number or letter<a;f> check if it isn't 5th in row
        in case anything breaks, return 3
    end

I'm not posting complete lua code, because it looks like homework/learning excercise and full answer would harm you more than it would help you.

Bartek Banachewicz
  • 38,596
  • 7
  • 91
  • 135
  • thanks, but none of that is actually lua code so if you could think of a complete code it wouldn't work. – Col_Blimp Jun 11 '12 at 09:28
  • 1
    I thought i made this clear. It's pseudocode - you are the one who need to convert it to actual lua source. – Bartek Banachewicz Jun 11 '12 at 10:58
  • I thought the help request was obvious, I know how to do 2 of the 4 tasks but I am stuck on the filtering of the ipv6 because there is three possible ways that a string containing the info could be a valid address all with various segments and numbers of : and I wanted an opinion on if just checking for 3 . and all numbers would be sufficient for ipv4, if you could offer some info on either then that would be helpful if not anyone else out there? – Col_Blimp Jun 11 '12 at 14:11
  • 2
    @Mick: "I thought the help request was obvious" There is a difference between "help" and "plz giv me teh codez!" Bartek gave you *help*. You seem to want the latter. – Nicol Bolas Jun 11 '12 at 15:09
  • @Nicol Bolas, so you dont know either. – Col_Blimp Jun 11 '12 at 15:12
  • 1
    @Mick: Do I "know?" No, I don't have that function lying around somewhere. I could *write it* of course. But I'm not going to, since doing your work for you seems to be the only answer you'll accept. – Nicol Bolas Jun 11 '12 at 15:13
  • 1
    @Nicol Bolas, just wondering what your motivations for joining a help forum were? and I mean "were" as you literally are no help at all and just reiterating my own request and another's post is pointless, thanks for your time I think! – Col_Blimp Jun 11 '12 at 15:20
1

Interestingly, none of the above answers takes the test examples of the original question into account, because using them, all of the above checks would fail (because of #3):

fe80:0000:0000:0000:0202:b3ff:fe1e:8329
fe80:0:0:0:202:b3ff:fe1e:8329
fe80::202:b3ff:fe1e:8329 (!)

IPv6 representation rules say:

One or more consecutive groups of zero value may be replaced with a single empty group using two consecutive colons (::),1 but the substitution may only be applied once in the address, because multiple occurrences would create an ambiguous representation. https://en.wikipedia.org/wiki/IPv6_address#Representation

As Lua patterns do not have support for Alternation, it is not possible to check IPv6 with a single pattern. You may see David M. Syzdek answer on the complexity of IPv6 Regex: https://stackoverflow.com/a/17871737/1895269

Still, a more standards conforming approach is the following improvement of Paul Kulchenko's answer:

function GetIPType(ip)
  local R = {ERROR = 0, IPV4 = 1, IPV6 = 2, STRING = 3}
  if type(ip) ~= "string" then return R.ERROR end

  -- check for format 1.11.111.111 for ipv4
  local chunks = { ip:match("^(%d+)%.(%d+)%.(%d+)%.(%d+)$") }
  if (#chunks == 4) then
    for _,v in pairs(chunks) do
      if tonumber(v) > 255 then return R.STRING end
    end
    return R.IPV4
  end


  -- check for ipv6 format, should be max 8 'chunks' of numbers/letters
  local addr = ip:match("^([a-fA-F0-9:]+)$")
  if addr ~= nil and #addr > 1 then
    -- address part
    local nc, dc = 0, false      -- chunk count, double colon
    for chunk, colons in addr:gmatch("([^:]*)(:*)") do
      if nc > (dc and 7 or 8) then return R.STRING end    -- max allowed chunks
      if #chunk > 0 and tonumber(chunk, 16) > 65535 then
        return R.STRING
      end
      if #colons > 0 then
        -- max consecutive colons allowed: 2
        if #colons > 2 then return R.STRING end
        -- double colon shall appear only once
        if #colons == 2 and dc == true then return R.STRING end
        if #colons == 2 and dc == false then dc = true end
      end
      nc = nc + 1      
    end
    return R.IPV6
  end


  return R.STRING
end

The script to check:

local IPType = {[0] = "Error", "IPv4", "IPv6", "string"}
local ips = {
  "128.1.0.1",    -- ipv4
  "223.255.254.254",  -- ipv4
  "999.12345.0.0001",   -- invalid ipv4
  "1050:0:0:0:5:600:300c:326b",         -- ipv6
  "1050!0!0+0-5@600$300c#326b",         -- string
  "1050:0:0:0:5:600:300c:326babcdef",     -- string
  "1050:0000:0000:0000:0005:0600:300c:326b",  -- ipv6
  "1050:::600:5:1000::",  -- contracted ipv6 (invalid)
  "fe80::202:b3ff:fe1e:8329",   -- shortened ipv6
  "fe80::202:b3ff::fe1e:8329",  -- shortened ipv6 (invalid)
  "fe80:0000:0000:0000:0202:b3ff:fe1e:8329:abcd",  -- too many groups
  "::1",   -- valid IPv6
  "::",  -- valid IPv6
  ":",   -- string
  "129.garbage.9.1",  -- string
  129.10        -- error
}
for k,v in pairs(ips) do
  print(v, IPType[GetIPType(v)])
end

And the output:

128.1.0.1       IPv4
223.255.254.254 IPv4
999.12345.0.0001        string
1050:0:0:0:5:600:300c:326b      IPv6
1050!0!0+0-5@600$300c#326b      string
1050:0:0:0:5:600:300c:326babcdef        string
1050:0000:0000:0000:0005:0600:300c:326b IPv6
1050:::600:5:1000::     string
fe80::202:b3ff:fe1e:8329        IPv6
fe80::202:b3ff::fe1e:8329       string
fe80:0000:0000:0000:0202:b3ff:fe1e:8329:abcd    string
::1     IPv6
::      IPv6
:       string
129.garbage.9.1 string
129.1   Error
chrisfish
  • 21
  • 2
0

As Lua's regular expressions are not sufficiently expressive, you must proceed with an iterative algorithm.

I suggest you to check the one that I posted on Italian Wikipedia (which have been fully tested):

local R = {ERROR = 0, IPV4 = 1, IPV6 = 2, STRING = 3}

function is_ipv4(str)
    local s = str:gsub("/[0-9]$", ""):gsub("/[12][0-9]$", ""):gsub("/[3][0-2]$", "")
    
    if not s:find("^%d+%.%d+%.%d+%.%d+$") then
        return nil
    end
    
    for substr in s:gmatch("(%d+)") do
        if not substr:find("^[1-9]?[0-9]$")
                and not substr:find("^1[0-9][0-9]$")
                and not substr:find( "^2[0-4][0-9]$")
                and not substr:find("^25[0-5]$") then
            return nil
        end
    end
    
    return R.IPV4
end

function is_ipv6(str)
    local s = str
    
    if not (s:find("^%w+:%w+:%w+:%w+:%w+:%w+:%w+:%w+$")          -- there are exactly seven ":"
                or (s:find("^%w*:%w*:%w*:?%w*:?%w*:?%w*:?%w*$")  -- otherwise there are two to six sei ":"
                    and s:find("::")))                           -- and there must be the substring "::"
            or s:find("::.*::")                                  -- but there cannot be neither two substrings "::"
            or s:find(":::") then                                -- nor a substring ":::"
        return nil
    end
    
    for substr in s:gmatch("(%w+)") do
        if not substr:find("^[0-9A-Fa-f][0-9A-Fa-f]?[0-9A-Fa-f]?[0-9A-Fa-f]?$") then
            return nil
        end
    end
    
    return R.IPV6
end

function ip_type(str)
    if type(str) ~= "string" then
        return R.ERROR
    else
        return is_ipv4(str) or is_ipv6(str) or R.STRING
    end
end

Edit: I altered the ip_type() function's output as requested by the OP.

logi-kal
  • 7,107
  • 6
  • 31
  • 43