12

I'm creating a small IP:PORT scraper in PHP. The problem is that I'm pretty unfamiliar with RegEx.

So I've been piecing together what I can.

Here's what I've got: /\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):([0-9]{1,5})\b/

I know this isn't the best. At least not the end to grab the port, because it means that ports will be able to be things like 99999.

Also, it seems to return two matches this way. The IP:PORT and the PORT. I just need it to grab the full IP:PORT, not one or the other.

halfer
  • 19,824
  • 17
  • 99
  • 186
Rob
  • 7,980
  • 30
  • 75
  • 115
  • is port not everything after : ? –  Dec 04 '11 at 22:09
  • @Dagon: No, it just be a few integers after it. (1-5 integers) – Rob Dec 04 '11 at 22:11
  • What do the input strings look like? i.e. where are you actually try to grab them *from*? – DaveRandom Dec 04 '11 at 22:17
  • @DaveRandom, various webpages, in which the HTML varies greatly. – Rob Dec 04 '11 at 22:18
  • ...and are you trying to grab them out of full URLs (like `http://IP:PORT/some/stuff`), or are they just `IP:PORT` on their own? Come to that, will there always be a `:PORT` section or might some of them just be `IP`? – DaveRandom Dec 04 '11 at 22:22
  • They're usually just alone, but might have some sort of HTML directly before or after. Not full URLs usually, though it's possible. There IS always a :PORT section. – Rob Dec 04 '11 at 22:24
  • you may try this : http://stackoverflow.com/a/25866412/3767784 – FaNaJ Sep 16 '14 at 10:33

6 Answers6

5

I've posted a regular expression below what matches either ip or ip and port.

$ip = '111.222.333.444';
if ( preg_match('/([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\:?([0-9]{1,5})?/', $ip, $match) ) {
   echo 'ip: ' . $match['1'] . (isset($match['2']) ? ' port: ' . $match['2'] : '');
}
TURTLE
  • 3,728
  • 4
  • 49
  • 50
5

Your regex is fine so I will just concentrate on the port itself. This regex :

(?::                #Match the :
  (?![7-9]\d\d\d\d) #Ignrore anything above 7....
  (?!6[6-9]\d\d\d)  #Ignore anything abovr 69...
  (?!65[6-9]\d\d)   #etc...
  (?!655[4-9]\d)
  (?!6553[6-9])
  (?!0+)            #ignore complete 0(s)
  (?<Port>\d{1,5})
)?

Will optionally catch any valid port number and store it to named group port.

Note: free spacing must be enabled:

if (preg_match(
    '/\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
    (?::
      (?![7-9]\d\d\d\d) #Ignrore anything above 7....
      (?!6[6-9]\d\d\d)  #Ignore anything abovr 69...
      (?!65[6-9]\d\d)   #etc...
      (?!655[4-9]\d)
      (?!6553[6-9])
      (?!0+)            #ignore complete 0(s)
      (?P<Port>\d{1,5})
    )?
    \b/x', 
    $subject)) {
    # Successful match
}
FailedDev
  • 26,680
  • 9
  • 53
  • 73
4

You could try this:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):\d{1,5}\b

There are a few examples for IP matching here. Just take any of them and put :\d{1,5}\b on the end (to match a port).

Brigand
  • 84,529
  • 20
  • 165
  • 173
  • Yeah but wouldn't that then match ports like 99999? – Rob Dec 04 '11 at 22:28
  • Regular Expressions aren't quite [turing complete](http://en.wikipedia.org/wiki/Turing_completeness). You can use some PHP to figure out if the port is completely legal or not. Or you can put all the number ranges in there if you like. EDIT: See Fallen's solution for ports. I still recommend doing that part in PHP. – Brigand Dec 04 '11 at 22:45
1

FailedDev's Port portion of his answer - shortened it a bit and set boundaries, this will only catch the port

\b(?![7-9]\d{4})(?!6[6-9]\d{3})(?!65[6-9]\d{2})(?!655[4-9]\d)(?!6553[6-9])(?!0+)(\d{1,5})\b
DJMcMayhem
  • 7,285
  • 4
  • 41
  • 61
kbrucej
  • 11
  • 2
0

Try this Pattern/Regex works for all scenarios, it gives you the output of only valid IPV4 format IP-Address & Port number ''^([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}:[0-9]{1,4})(:[0-9]{1,4})?$'

My Input:

10.128.16.38:22

1050:0000:0000:0000:0005:0600:300c:326b:22

11.11.11.11

asdfasdf

1012312101231210123121012312101231210123121012312101231210123121012312

10.128.45.23:9095

10.128.16.27:22 asdfasdfasdf

as@#$@#$

1050:0000:0000:0000:0005:0600:3002:3260:90

10.128.46.00:

Output: (Only Valid Ip:Port):

10.128.16.38:22

10.128.45.23:9095

ShivaPrasad Gadapa
  • 193
  • 1
  • 2
  • 10
0

I have used this long time ago.

[0-9]{3}.[0-9]{3}.[0-9]{3}.[0-9]{3}:[0-9]{5}
greenLizard
  • 2,326
  • 5
  • 24
  • 30
  • 2
    Why would you use [0-9]? I can't imagine an IP being 999.999.999.999 or anywhere close. As it stands now, the one I provided in my question is more efficient. – Rob Dec 04 '11 at 22:32