Creating a simple regex for scraping URLs

Question

Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
Grabbing the href attribute of an A element

I'm trying to scrape a URL from the following string...

<a class="uf" href="--"><b>Massage</b> Sacramento. Mae's Acupressure</a>

Here's the regex I've got now...

<a class="uf" href="(.*?)">.*?<\/a>

However, it's not getting any results when scraping the page.

What am I doing wrong here?

I'm doing this in PHP, by the way.

Uh oh. Look out, people are gonna rip you apart for trying to parse URLs with regex... Use an actual parser somewhere. — kevlar1818, Aug 04 '11 at 17:03
*(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) — Gordon, Aug 04 '11 at 17:05

score 1 · Answer 1 · answered Aug 04 '11 at 17:09

Actually your regex works just fine. You should provide more insight in what you try to accomplish

Try this:

$content = 'something <a class="uf" href="--"><b>Massage</b> Sacramento. Mae\'s Acupressure</a> some other text';
preg_match('#<a class="uf" href="(.*?)">.*?</a>#', $content, $matches);
print_r($matches);
exit;

It will print:

Array
(
  [0] => <a class="uf" href="--"><b>Massage</b> Sacramento. Mae's Acupressure</a>
  [1] => --
)

which is the expected result as far as I can see

score 0 · Answer 2 · edited May 23 '17 at 10:34

0

<a class="uf" href="[A-Za-z_-\.]*?">[A-Za-z_-\.]*?<\/a>

Also can't forget: RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 10:34

Community

1
1

answered Aug 04 '11 at 17:03

Naftali

144,921
39
244
303

Creating a simple regex for scraping URLs

2 Answers2