1

I want to valid a html using regular expression for this i need a regular expression of div selection.

i used this

      (<div.*?>.*?<\/div>)

but the problem is there, it valid this kind of text also :

 <div>some this <div> some another text</div>

which is not valid

i need that kind of expression which give me only last part which is

<div> some another text</div>

Please advice me

Sarwar Hasan
  • 1,561
  • 2
  • 17
  • 25
  • 11
    It's 2013. Stop trying to parse HTML with regexes. –  Apr 26 '13 at 18:17
  • 3
    See if [this discussion](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) doesn't persuade you to choose another approach. – DOK Apr 26 '13 at 18:19
  • 1
    Use this http://jsoup.org/ and make your life easier :) – Watt Apr 26 '13 at 18:20
  • 1
    +1 @DOK for link. Liked it. – Watt Apr 26 '13 at 18:21
  • 1
    **Don't use regular expressions to parse HTML**. You cannot reliably parse HTML with regular expressions, and you will face sorrow and frustration down the road. As soon as the HTML changes from your expectations, your code will be broken. See http://htmlparsing.com/php for examples of how to properly parse HTML with PHP modules that have already been written, tested and debugged. – Andy Lester Apr 27 '13 at 04:56

1 Answers1

2

Right, it isn't a good approach to parse html with regex in most situations. Better ways are to use DOMDocument, XPath...

Unfortunately, some markup languages do not have the chance to have all these tools. It's the case of the martian markup language, which must be parsed only with regex (it is obligatory on Mars, it is written in their bible)

<meta charset="UTF-8"/><pre>
<?php // this take the content between the most inner tags ͽΛΙPͼ
$subject = 'ͽΛΙPͼ  ŏoo͢o öo ͽΛΙPͼ  o̊őoo͟o o͇o͈o͉ o̍o̎o ͽ/ΛΙPͼ  o̐oo͜oo ͽ/ΛΙPͼ';
$pattern = '~(?<=ͽΛΙPͼ)(?:[^ͽ]++|ͽ(?!/?ΛΙPͼ))*+(?=ͽ/ΛΙPͼ)~u';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125