0

Possible Duplicate:
Grabbing the href attribute of an A element

Hello,

I have the following html I want to parse:

<td align="left" nowrap="nowrap"><a href="XXXXXXX">

I want to save XXXXX on a variable. I know next to nothing of regular expressions. I know how to do it using strpos, substr, etc. But I believe it is slower than using regex.

if (preg_match('!<td align="left" NOWRAP><a href=".\s+/.+">!', $result, $matches))
    echo $matches[1];
else
    echo "error!!!";

I know the previous code is an atrocity to a regex expert. But I really have no idea how to do it. I need some tips, not the full solution.

Community
  • 1
  • 1
Cornwell
  • 3,304
  • 7
  • 51
  • 84

3 Answers3

3

Here's my (not remotely original) tip: don't use regex to parse HTML. Use an HTML parser.

See How do you parse and process HTML/XML in PHP?.

Community
  • 1
  • 1
Matt Ball
  • 354,903
  • 100
  • 647
  • 710
2

One thing of knowing regex is to know when not to use them.

Often when you want to parse HTML, 9/10 times, regex is not the right tool.

You can use a DOM parser.

alex
  • 479,566
  • 201
  • 878
  • 984
1

If your structure is always like the same you posted, you can use this REGEX:

<td\s+align="left"\s+nowrap="nowrap">\s*<a\s+href="(.*?")>

and then take the group #1 that is the string between parenthesis. You have to make a group, a zone between the parenthesis wich contains the data you would get. This link contains useful information about regex and the PHP implementation.

Alberto
  • 1,569
  • 1
  • 22
  • 41