0

please see code :

$result = "<b>Associated Names</b>&nbsp;&nbsp;[<a href='http://www.examples.com/authors.html?act=change&id=6141&item=associated'><u>Edit</u></a>]</td> 
        </tr> 
        <tr> 
          <td class='text' align='left'>G&#12539;R<br />G-R<br />         </td>"

preg_match_all("/<b>Associated Names.{10,100}<td class='text' align='left'>((.*<br \/>)*).*<\/td>/sU", $result, $assoc);
var_dump($assoc);
-----------------------------------------------------------
RESULT 
array
  0 => 
    array
      0 => string '<b>Associated Names</b></td>
        </tr>
        <tr>
          <td class='text' align='left'>G&#12539;R<br />G-R<br />         </td>' (length=135)
  1 => 
    array
      0 => string '' (length=0)
  2 => 
    array
      0 => string '' (length=0)

I want it return

array(
    1 => 
     array
      0 => string 'G&#12539;R',
    2 => 
     array
      0 => string> 'G-R'
)

it is a matter of parentheses ((.)) i want fix it, please help me

Brad Mace
  • 27,194
  • 17
  • 102
  • 148
meotimdihia
  • 4,191
  • 15
  • 49
  • 69

1 Answers1

3

Please don't try to parse HTML with regular expressions, it invokes the wrath of Zalgo.

Try using the DOM and xpath to target the specific elements and attributes you are attempting to extract.

(I'd provide an xpath example, but it's still on my to-learn list... :) )

Community
  • 1
  • 1
Charles
  • 50,943
  • 13
  • 104
  • 142
  • Unfortunately, some times it is the only way, because not every page is well formated. Many a times, Zend Dom Query has failed to create the dom correctly and I 've got wrong results. Not a fault of the framework of course, but parsing can get messy. I use both approaches, ad hoc. – johnjohn Jul 17 '10 at 17:31
  • @john, have you tried to run the page through [tidy](http://us2.php.net/manual/en/book.tidy.php) first? – Charles Jul 17 '10 at 17:42
  • yes, for a specific page that was quite troublesome (scraping project), i used an external tidy service before creating the dom, without success. For the same page, I also tried using a ready made class to tidy it. It always gave back half the page. I decided not to dig deeper. :) – johnjohn Jul 17 '10 at 17:50