0

I have a string like this

<div tagname="chapter_title" class="CHAP_TTL" aidpstyle="CHAP_TTL">testt</div>
<div tagname="section" id="sec01">
<div tagname="title" class="H1" aidpstyle="H1" id="sec01">
     INTRODUCTION<!--title-->
</div>
<div tagname="para" class="CHAP_BM_FIRST" aidpstyle="CHAP_BM_FIRST">test3
<div tagname="emph" class="ITALIC" aidcstyle="ITALIC">buildings</div>   

I'm trying to find the DIV that doesnot contain word (emph,section) in tagname attribute

I used the below pattern but its not showing the right output

 preg_match_all('/<div tagname="(?!emph)(?!section)(?!footnote)
      (?!note).*"/i',$new_updated_html,$divstarttag);

Any takers ??

j0k
  • 22,600
  • 28
  • 79
  • 90
Rockstar
  • 191
  • 1
  • 2
  • 21
  • 4
    Obligatory [don't use regex to parse html](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?page=1&tab=votes#1732454) link. – Leigh Aug 08 '12 at 07:26
  • Agreed. PHP has a perfectly good [DOM parser class](http://php.net/manual/en/class.domdocument.php) – Mitya Aug 08 '12 at 07:28

2 Answers2

3

I checked your code and its works fine except that you need to replace all the new lines and spaces using following two lines-

$string = preg_replace('/\s\s+/', ' ', $subject);

$data = preg_replace('/\r\n/', "", $string);
Pavan Manjunath
  • 27,404
  • 12
  • 99
  • 125
Indian
  • 645
  • 7
  • 22
1

Please take a look at PHP Simple DOM Parser (or any other PHP HTML parsing framework). Using regular expressions for HTML is something which you should avoid. The DOM Parser should allow you to iterate over the div sections in your document and access the relevant information.

npinti
  • 51,780
  • 5
  • 72
  • 96