1

I'm running simple html dom on php 7.1.

But the first line I can not parse html

My code

<?php
include 'simple_html_dom.php';

$html = file_get_html('http://google.com');

echo $html;
?>

The page displays nothing (white background) with the above code.

But the below code but runs:

<?php
include 'simple_html_dom.php';
//base url
$base = 'https://google.com';
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$html_base = new simple_html_dom();
// Load HTML from a string
$html_base->load($str);
echo $html_base;
$html_base->clear(); 
unset($html_base);
?>

Then, I try to get img with class below code with above code but no working:

Image html to get:

<div class="product_thumb">
<a title="Me Before You" class="image-border" href=/me-before-you-a-novel-movie-tie-in-p69988.html">
<img class="   pict lazy-img" id="det_img_00069988" 
src="/images/thumbnails/product/115x/222614_me-before-you-a-novel-movie-tie-
in.jpg">
</a></div>

My Simple HTML DOM, All dont working (get no html on may page)

//* Find all images 1st code
foreach($html_base->find('img[class=   pict lazy-img]') as $element) 
   echo '<img src="' . $element->src . '" />' . '<br>';
//* Find all images 2nd code
foreach($html_base->find('img[class=   pict lazy-img]',0) as $element) 
   echo '<img src="' . $element->src . '" />' . '<br>';
//* Find all images 3rd code
foreach($html_base->find('img[class$=pict lazy-img]',0) as $element) 
   echo '<img src="' . $element->src . '" />' . '<br>';
//* Find all images 4th code
foreach($html_base->find('img[class$=pict lazy-img]',0) as $element) 
   echo '<img src="' . $element->src . '" />' . '<br>';
Community
  • 1
  • 1
Clear Code
  • 11
  • 1
  • 3
  • `file_get_html` seems to return an object, use `var_dump($html)` instead of `echo` – pirs Nov 10 '17 at 02:08
  • [PHP ini file_get_contents external url](https://stackoverflow.com/questions/3488425/php-ini-file-get-contents-external-url) - it's probably just the `allow_url_fopen` PHP configuration. BUT, could you enable error reporting to see the actual error? That would help with debugging this. – HPierce Nov 10 '17 at 02:10
  • var_dump($html) run on php 7.1 with results like echo $html on php 5.6. – Clear Code Nov 10 '17 at 02:28
  • Its fine when run follow https://stackoverflow.com/a/44131040/8916968 It work done like run on php 5.6 – Clear Code Nov 10 '17 at 03:19

3 Answers3

8

file_get_html change in simple_html_dom include file needs to be changed. See below, it worked for me. See link https://sourceforge.net/p/simplehtmldom/bugs/161/

Since PHP 7.1 it is possible to interpret negative offset. The default Value of offset have to be changed from -1 to 0.

function file_get_html($url, $use_include_path = false, $context=null, $offset = 0, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
rand0m
  • 842
  • 10
  • 24
James King
  • 81
  • 1
  • 2
3

I know this a weebit old but you can always just download the newest version here -> https://sourceforge.net/projects/simplehtmldom/

The newest update as of this post is 10-08-19

norcal johnny
  • 2,050
  • 2
  • 13
  • 17
2

I escaped this by changing "simple_html_dom.php" file in method "parse_slector()" (in line 386) as

$pattern = "/([\w\-:\*]*)(?:\#([\w\-]+)|\.([\w\-]+))?(?:\[@?(!?[\w\-]+)(?:([!*^$]?=)[\"']?(.*?)[\"']?)?\])?([\/, ]+)/is";

and in method "read_tag()" (in line 722)

if (!preg_match("/^[\w\-:]+$/", $tag)) {
...
}

the trick is adding backslash before "-" on the pattern

user3410311
  • 582
  • 4
  • 6
  • This worked for me (PHP 7.3), thanks a lot ! I also removed the offset parameter in file_get_contents on line 75 and set the offset to 0 in the file_get_html function on line 70 works too. – crtn-hrd Aug 28 '19 at 10:04