1

I am after a regular expression to parse IP addresses and extract their host, port, username, and password.

Here are the formats I am interested in:

hoju
  • 28,392
  • 37
  • 134
  • 178

6 Answers6

7

Try something like this

(http://(\w+:\w+@)?)?(\d{1,3}\.){3}\d{1,3}(:\d{1,5})?

Explanation:

(http://(\w+:\w+@)?)? - optional group of http:// followed by optional user:pass@
(\d{1,3}\.){3} - three groups of one to three digits followed by a dot
\d{1,3} - one to three digits
(:\d{1,5})? - optional group of colon followed by one to five digits
Anton Hansson
  • 2,141
  • 2
  • 14
  • 16
  • it would be much better if u specified that ip is a number in (1-255) not starts by 0 and greater than 255.. – jargalan Nov 03 '10 at 08:47
  • Yes it is not very robust. See the link provided by Merlyn for some examples of how to only allow certain combinations of digits if needed. – Anton Hansson Nov 03 '10 at 08:59
  • +1, I have a good chunk of this implementation in my answer (which I figured out on my own), but I "debugged" it by comparing to this answer ;) (I had the username:password@ syntax backwards, lol). This one is still better, though - I prefer the \w, and mine doesn't have the optional http://. The explanation is also cleaner. – Merlyn Morgan-Graham Nov 03 '10 at 09:12
4

Doing the match this way may not be a best practice. It might be better to plug into some sort of code with real smarts in it, that can do general-purpose URI parsing. If you have limited needs, though, and can comment/document thoroughly that your code will break if you demand more of it, then maybe it makes sense to go down this path.

The simplest way is to match four sets of 1 to 3 digits, with:

  • optionally, one-or-more not-:, plus :, plus one-or-more not-@, plus @
  • optionally, :, plus 1 to 5 digits

Something like:

([^:]+:[^@]+@)?(\d{1,3}\.){3}\d{1,3}(:\d{1,5})?

But this would accept silly stuff, like "999.999.999.999:99999"

If you only want to accept valid IP addresses, and don't care that it happens to be part of a URI, or don't care what other garbage exists in the string, here is an example:

http://www.regular-expressions.info/examples.html

It basically matches four sets of:

  • 2, plus 0-4, plus 0-9
  • or 2, plus 5, plus 0-5
  • or 1, plus 0-9, plus 0-9
  • or 1-9, plus 0-9
  • or 0-9

That should get you started.

  • optionally, one-or-more not-:, plus :, plus one-or-more not-@, plus @ (max lengths may be interesting, here)
  • optionally, :, plus 0-65535 (this I'll leave up to you, based on the 0-255 rules above)

There are other range-based rules for matching IP addresses that you might want to avoid (stuff like 0.0.0.0, and reserved ranges), but it may be easier to do subsequent matching for these.

Basically, I'd suggest you use the very-simple example, or plug into a library.

Merlyn Morgan-Graham
  • 58,163
  • 16
  • 128
  • 183
  • That would benefit tremendously from being in `(?x)` mode so you can get some elbowroom for cognitive chunking even if you don’t include actual comments. – tchrist Nov 03 '10 at 12:03
  • @tchrist: I have no idea what you're talking about, but it sounds interesting :) Is this in reference to perl? grep? – Merlyn Morgan-Graham Nov 03 '10 at 19:23
  • Many regex engines allow you to use whitespace and comments in your patterns if you include a `/x` or embed `(?x)`. – tchrist Nov 03 '10 at 21:16
2

You can start with that (python):

import re

pattern = "((?P<login>\w+):(?P<password>\w+)@)?(?P<ip>\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3})(:(?P<port>\d+))?"

re.match(pattern, "12.34.56.789").groupdict()
re.match(pattern, "12.34.56.789:80").groupdict()
re.match(pattern, "john:pass@12.34.56.789:80").groupdict()

And obviously, the IP you specified is not valid (as Matt says ...)

Antoine Pelisse
  • 12,871
  • 4
  • 34
  • 34
  • Worth to note that this parses invalid IPs and ports too, i.e: `999.999.999.999:000000222222222`. So this solution is great for use in vacuum. – pronebird Oct 30 '20 at 10:06
2

Here is a small script whipped up in perl that does the following things a) Strips out username and password after checking that the former starts with a character b) Validates ip address c) validated port

#!/usr/bin/perl

    while (<>) {
    chomp;
            if (/(?:(?:([a-zA-z]\w+)\:(\w+))@)?((\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}))(?:\:(\d{1,5}))?/) {
                    print "username=$1\n";
                    print "password=$2\n";
                    print "ip address=$3\n";
                    print "port=$8\n";
                    print "Warning: IP Address invalid\n" if ($4>255||$5>255||$6>255||$7>255);
                    print "Warning: Port Address invalid\n" if ($8>65535);
            }
    }

EDIT: Recommendation from tchrist below

Community
  • 1
  • 1
Philar
  • 3,887
  • 1
  • 24
  • 19
  • 1
    You don’t need to mention `\d` if you already have `\w`: it’s redundant. – tchrist Nov 03 '10 at 11:59
  • 1
    You can write `[\w]+` just as `\w+` now that you don't have two things to select from. Also, `[a‑z][A‑Z]` *might* be better written as any character with the Unicode "Letter" property, which is `\p{Letter}` or `\pL` for short. – tchrist Nov 03 '10 at 12:21
  • If it were me, I’d also escape the `@` just out of sheer reflex, even though Perl doesn't make you. – tchrist Nov 03 '10 at 12:23
  • Was just looking into your other post re handling unicode. Will get my head around working with unicode in perl and then update this as per your recommendation. Thanks again – Philar Nov 03 '10 at 12:33
0

for match exclusively a valid IP adress use

(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

instead of

([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])(\.([01]?[0-9][0-9]?|2[0-4][0-9]|25[0-5])){3}

because many regex engine match the first possibility in the OR sequence

you can try your regex engine : 10.48.0.200

Alban
  • 3,105
  • 5
  • 31
  • 46
0

Regexlib would be a helpful resource for your question. You can find many solutions (May be you will need to combine some)

Chathuranga Chandrasekara
  • 20,548
  • 30
  • 97
  • 138