1

I've been getting an error message for the following piece of code (I'm trying to get the content inside the 'article' tags on a certain web page):

function getTextFromLink($url) {
    $html = new DOMDocument();
    $html->loadHTML($url);
    $text = $html->getElementsByTagName('article')->item(0)->textContent;
    return $text;
}

It says that I'm trying to get the property of a non-object on the line with

$text = $html->getElementsbyTagName('article')->item(0)->textContent;

I'm fairly new to php and DOM; what am I missing here?

Anna Fenske
  • 21
  • 1
  • 2
  • 3
    Run `var_dump($html->getElementsbyTagName('article')->item(0));` and you'll see that it isn't an object. Might give you a clue on breaking it down, you can continue to break it down further from there. – Devon Bessemer Dec 15 '15 at 01:00
  • @JackSmith: That's slightly different -- the problem in your linked question is that the value is not an object but an array, while the problem *here* is that the value is *null*. Both are, of course, not objects, but I doubt the answers to that question would be particularly useful to the OP. – Ilmari Karonen Dec 15 '15 at 04:03

1 Answers1

4

You have two problems in your code:

The obvious problem is that $html->getElementsByTagName('article')->item(0) is not an object. Specifically, it is null, since the HTML you're parsing doesn't actually contain any article elements. You could've figured this out yourself by following Devon's advice and viewing the value of $html->getElementsByTagName('article')->item(0) using var_dump().

Now, why doesn't your HTML contain any article elements? Well, the real problem turns out to be that the loadHTML() method will load HTML from a string and parse it. That is to say, when you call $html->loadHTML($url);, PHP will parse the contents of the string variable $url as HTML code, and give you a DOMDocument representing the result. Given that you named the variable $url, I'm pretty sure that's not what you want.

What you actually want to use instead is probably loadHTMLFile(), which actually loads HTML code from a named file (or, apparently, URL), rather than from a PHP string.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153