0

I have problem writing a regular express which match with only div class name "classBig1" and has one anchor link as its child. Here is my code but it doesn't work:

preg_match_all ("/<div class=\"headline9\"><a[\s]+[^>]*?href[\s]?=[\s\"\']+".
                    "(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a></div>/", 
                    $var, &$matches);

//example HTML: <div class="classBig1"><a href="http://yahoo.com">Go Index99</a></div>
Devyn
  • 2,255
  • 7
  • 32
  • 40
  • 11
    Don't use regex to parse HTML. Use a HTML parser instead. See [Best methods to parse HTML](http://stackoverflow.com/questions/3577641/3577662#3577662) – Pekka May 24 '11 at 09:36
  • Thanks Pekka, I'll have a look at your link. – Devyn May 24 '11 at 09:40

2 Answers2

0

I guess you had mentioned a wrong class-name in the code, but I consider it is "classBig1" - please take a look at the pattern that I have given.

I believe:

  1. You just wanted to get those "DIV" which has a class of "classBig1"
  2. These "DIVs" should have only one "A" tag.

If yes, then don't hesitate to grab this piece of code :-).

It seems to be working for me when I tried with a sample HTML code.

Pattern:

"/<div class=\"classBig1\"><a (.*)<\/a><\/div>/"

Hope it helps.

Rakesh Sankar
  • 9,337
  • 4
  • 41
  • 66
0

If the HTML is as well formed as your example then the following regex is enough to solve your problem:

  • <div class="classBig1"><a .*?</div>

The full PHP code would be:

preg_match_all('%<div class="classBig1"><a .*?</div>%', $html,
      $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    $match = $result[0][$i];
}
Staffan Nöteberg
  • 4,095
  • 1
  • 19
  • 17