0

I've found this code already for dealing with content between tags

$content_processed = preg_replace_callback(  
'#\<pre\>(.+?)\<\/pre\>#s',   create_function(
    '$matches',
     'return "<pre>".htmlentities($matches[1])."</pre>";'   ),   $content );

but how could I get it to just get a section of the HTML. The bit I'm looking at starts with;

click here</a></p><p><span class='title'>Soups<br />

and ends at

 <div style='font-size:0.8em;'>

(The parts I've chosen are quite long because that way they are unique in the HTML.)

hakre
  • 193,403
  • 52
  • 435
  • 836
e__
  • 302
  • 3
  • 5
  • 17
  • How can I get PHP to only echo the content between those tags; 'PHP : Echo Content between Two points in an HTML Document' – e__ Jan 26 '12 at 19:07
  • [Why using regex for this task is **not recommended**](http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not) –  Jan 26 '12 at 19:10
  • 1
    I don't understand the point of doing this. I'm assuming that you're scraping another website. If that website handles output escaping properly, then you're going to be double-escaping the data between the `pre` tags. If it doesn't handle output escaping properly, then it's vulnerable to XSS and your scraping may not work as expected, and could leave you open to XSS as well. – FtDRbwLXw6 Jan 26 '12 at 19:50

1 Answers1

0

Do not parse html with regex. Bad, bad idea. Better use an XML parser to make it a nested object/array. That way you will be off much safer.

HOWEVER, if you use static code only on your web page (EG code that is never subject to change), you can just explode on that delimiter to chop the page in two halves, and explode again

example:

$html = file_get_contents('path/to/page.phtml');
$text = explode('click here</a></p><p><span class=\'title\'>Soups<br />', $html);
$text = explode('<div style='font-size:0.8em;'>', $text[1]);
$text = $text[0];
echo $text;