-1

Possible Duplicate:
How to match URIs in text?
What is the best regular expression to check if a string is a valid URL?

I need to extract the URL to a zip file from a string that contains many URLs , using regexp (PHP).

A simple example should be helpful:

Target: extract the url http://en.wikipedia.org/wiki/Kettle.zip

Base string:

/url?q=http://en.wikipedia.org/wiki/Kettle.zip&sa=U&ei=VpnIUP22Js blah /url?q=http://en.wikipedia.org/wiki/Kettle&sa=U&ei=VpnIUP22Js blah /url?q=http://en.wikipedia.org/wiki/Kettle.rar&sa=U&ei=VpnIUP22Js

Update; lets say the base string is

href="http://en.wikipedia.org/wiki/Kettle.zip">Some text /a>Some other text here a href="http://google.com/wiki/Kettle"> /a>

i need to extract the http://en.wikipedia.org/wiki/Kettle.zip

any method is ok...regex or not.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Orafu James
  • 31
  • 1
  • 4
  • 2
    or rather of [How to match URIs in text?](http://stackoverflow.com/questions/82398/how-to-match-uris-in-text) – Bergi Dec 18 '12 at 23:08
  • 1
    Or one of the many others that talk about how to make that clickable, I bet there is a lot of regular expressions in all these duplicate QA materials. – hakre Dec 18 '12 at 23:09
  • Can't you split on `" blah "`, parse URL query string (and decodeURI!), get the `q` parameter and then filter for `.zip` extensions? – Bergi Dec 18 '12 at 23:10
  • if everything is of the form "url then description" separated by a space(s) then forget the regex and just use a split function, you've no need to recognise whether it's a url or not because you have the position… _then_ you could parse the url host and path from the querystring. – ian Dec 18 '12 at 23:11

1 Answers1

1

Don't use a regex. Regexes are not a magic wand that solve all problems related to strings.

Use parse_url() to break apart your URL and then use explode to break apart the query string on &.

$url = "http://example.com/url?q=http://en.wikipedia.org/wiki/Kettle.zip&sa=U&ei=VpnIUP22Js";
$query = parse_url($url, PHP_URL_QUERY);
print "query is: $query\n";
$args = explode( '&', $query );
print_r( $args );

Running this gives:

query is: q=http://en.wikipedia.org/wiki/Kettle.zip&sa=U&ei=VpnIUP22Js
Array
(
    [0] => q=http://en.wikipedia.org/wiki/Kettle.zip
    [1] => sa=U
    [2] => ei=VpnIUP22Js 
)

From there just walk through the array and find the one you want.

Andy Lester
  • 91,102
  • 13
  • 100
  • 152