How can I extract all IP:PORT from a given website ? I have this current Regex PATTERN but i think it doesn't grab all..
Or is it a better way to do it?
PATTERN = '((?:1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:1?\d{1,2}|2[0-4]\d|25[0-5]):\d{2,5}';
How can I extract all IP:PORT from a given website ? I have this current Regex PATTERN but i think it doesn't grab all..
Or is it a better way to do it?
PATTERN = '((?:1?\d{1,2}|2[0-4]\d|25[0-5])\.){3}(?:1?\d{1,2}|2[0-4]\d|25[0-5]):\d{2,5}';
Instead of RegEx, you can use the Internet Direct (Indy) unit IdURI. It can parse any URI into its protocol parts. It supports IPv4 and IPv6. The unit is quite self-contained.
MyURI := TIdURI.Create('http://127.0.0.1:8080');
try
MyHost := MyURI.Host;
MyPort := MyURI.Port;
finally
MyURI.Free;
end;
Properties expose detailed information about the URI:
property Bookmark : string read FBookmark write FBookMark;
property Document: string read FDocument write FDocument;
property Host: string read FHost write FHost;
property Password: string read FPassword write FPassword;
property Path: string read FPath write FPath;
property Params: string read FParams write FParams;
property Port: string read FPort write FPort;
property Protocol: string read FProtocol write FProtocol;
property URI: string read GetURI write SetURI;
property Username: string read FUserName write FUserName;
property IPVersion : TIdIPVersion read FIPVersion write FIPVersion;
See also this warning, however I think it does not affect simple host:port URI parsing:
https://stackoverflow.com/a/502011/80901
I recommend to download a current release of Indy to have the latest fixes.
This will work, if there is always a port following the IP:
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\:\d{2,5}\b
Matches:
1.2.3.4:80
001.002.003.004:2345
255.255.255.255:13245
Does not match:
1.2.3
1.2.3:01
1.2.3.4.5:99
299.299.299.299:123
Regexes are not a magic wand that you should wave at every problem relating to strings. In this case, the language you're using probably has support for URL parsing.
In PHP, you parse URLs with the parse_url()
function. http://php.net/manual/en/function.parse-url.php
In Perl, you use the URI::URL class http://search.cpan.org/dist/URI/
If you really want to use a regex, the Perl module http://search.cpan.org/dist/Regexp-Common/ has already-built regexes for you to detect IP addresses.
Whatever language that you're using, someone has already written, debugged and tested code that already does what you want. Use that existing code rather than writing your own.