0

I'm trying to get some URL's from a webpage using PHP regular expressions.

I'm doing this:

preg_match_all('/"r"><a href="http:.*?"/i',$Rec_Data, $stuff );

This works in returning the url's however I also get code I don't want:

"r"><a href="http://www.cbsnews.com/stories/2002/12/03/politics/main531460.shtml"

I can't get rid of the "r" and the "a" tag. I need it so I don't match URL's I don't want. How do I get only the part that is matched by ".*?" ?

SrgHartman
  • 651
  • 2
  • 8
  • 23
  • possible duplicate of [Extract all urls Href php](http://stackoverflow.com/questions/5262682/extract-all-urls-href-php) – Gordon Oct 23 '11 at 21:02
  • possible duplicate of [Regular Expression for grabbing the href attribute of an a element](http://stackoverflow.com/questions/3820666/regular-expression-for-grabbing-the-href-attribute-of-an-a-element/3820783#3820783) – Gordon Oct 23 '11 at 21:03
  • possible duplicate of [preg_match all a href](http://stackoverflow.com/questions/1519696/preg-match-all-a-href) – Gordon Oct 23 '11 at 21:03

1 Answers1

3

Use a capturing group and use the second element in the result:

preg_match_all('/"r"><a href="(http:.*?)"/i',$Rec_Data, $stuff );

See it working online: ideone

Also, you might want to consider using an HTML parser to parse HTML, instead of a regular expression.

Community
  • 1
  • 1
Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • @SrgHartman: What do you mean by "doesn't work"? I posted a link to an online demo where you can see that it works... – Mark Byers Oct 23 '11 at 20:54
  • @MarkByers: my bad, it actually works. However I get two arrays. That looks like a waste of resources. Is it possible to get the regex engine to return only the match I want? – SrgHartman Oct 23 '11 at 20:57