0

How can I extract all IP:PORT from a given website ? I have this current Regex PATTERN but i think it doesn't grab all..

Or is it a better way to do it?

PATTERN = '((?:1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:1?\d{1,2}|2[0-4]\d|25[0-5]):\d{2,5}';
Santos Oliveira
  • 497
  • 1
  • 8
  • 18
  • Why did you rolled back Delphi tags ? It's not Delphi related at all! – TLama Dec 19 '12 at 14:40
  • What is Delphi specific on RegEx ? – TLama Dec 19 '12 at 14:43
  • Yeah, the notification about the update (adding *"Or is it a better way to extract IP:PORT in Delphi ?"*, which quite much changed the original question anyway) came after I've posted that comment. – TLama Dec 19 '12 at 14:46
  • Are you sure that IPV6 support is not needed? – mjn Dec 19 '12 at 15:12

3 Answers3

4

Instead of RegEx, you can use the Internet Direct (Indy) unit IdURI. It can parse any URI into its protocol parts. It supports IPv4 and IPv6. The unit is quite self-contained.

MyURI := TIdURI.Create('http://127.0.0.1:8080');
try
  MyHost := MyURI.Host;
  MyPort := MyURI.Port; 
finally
  MyURI.Free;
end;

Properties expose detailed information about the URI:

property Bookmark : string read FBookmark write FBookMark;
property Document: string read FDocument write FDocument;
property Host: string read FHost write FHost;
property Password: string read FPassword write FPassword;
property Path: string read FPath write FPath;
property Params: string read FParams write FParams;
property Port: string read FPort write FPort;
property Protocol: string read FProtocol write FProtocol;
property URI: string read GetURI write SetURI;
property Username: string read FUserName write FUserName;
property IPVersion : TIdIPVersion read FIPVersion write FIPVersion;

See also this warning, however I think it does not affect simple host:port URI parsing:

https://stackoverflow.com/a/502011/80901

I recommend to download a current release of Indy to have the latest fixes.

Community
  • 1
  • 1
mjn
  • 36,362
  • 28
  • 176
  • 378
  • Can you post how to acctually extract IP:PORT from a given HTML Code using IdURI. If its more CPU friendly then that may be a better solution indeed. But i never did it before thats why comment. – Santos Oliveira Dec 19 '12 at 15:19
  • 1
    +1. Very nice. (It does not answer the question asked, but still a great suggestion.) – Ken White Dec 19 '12 at 15:54
  • Yes it doesn't have to do nothing with the Question at all. This is for parsing individual strings to IP,PORT etc. I asked to parse HTML code for IP:PORT matches. – Santos Oliveira Dec 19 '12 at 16:40
  • 1
    @SantosOliveira I did not see a HTML tag on your question so I assumed that website means website address – mjn Dec 19 '12 at 17:38
3

This will work, if there is always a port following the IP:

\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\:\d{2,5}\b

Matches:

1.2.3.4:80
001.002.003.004:2345
255.255.255.255:13245

Does not match:

1.2.3
1.2.3:01
1.2.3.4.5:99
299.299.299.299:123
Ken White
  • 123,280
  • 14
  • 225
  • 444
  • 2
    You second counterexample is a valid IP address and port. There don't have to be exactly four components in an IPv4 address. – Rob Kennedy Dec 19 '12 at 15:39
0

Regexes are not a magic wand that you should wave at every problem relating to strings. In this case, the language you're using probably has support for URL parsing.

In PHP, you parse URLs with the parse_url() function. http://php.net/manual/en/function.parse-url.php

In Perl, you use the URI::URL class http://search.cpan.org/dist/URI/

If you really want to use a regex, the Perl module http://search.cpan.org/dist/Regexp-Common/ has already-built regexes for you to detect IP addresses.

Whatever language that you're using, someone has already written, debugged and tested code that already does what you want. Use that existing code rather than writing your own.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152
  • 1
    Then perhaps this can help you http://stackoverflow.com/questions/124170/a-delphi-freepascal-lib-or-function-that-emulates-the-phps-function-parse-url – Andy Lester Dec 19 '12 at 15:23