Is whitespace in class name causing problem in Simple HTML DOM?

Question

I would like to scrape comments from Reddit using Simple HTML DOM, this is the address: https://old.reddit.com/r/UnresolvedMysteries/comments/.

I would like to scrape the texts in the paragraph tag only and store them in an array, but I can unfortunately not get it to work.

<div class="usertext-body may-blank-within md-container " >
  <div class="md">
    <p>What book? I’d like to check it out</p>
  </div>
</div>

I suspect the problem is this section: "usertext-body may-blank-within md-container " >. This is my code, what is the correct code?

foreach($html->find('.usertext-body may-blank-within md-container ') AS $results) {
  foreach($results->find('p') AS $comment){
    $comments[] = $comment;
}}

score 0 · Answer 1 · answered Jan 31 '21 at 10:22

The whitespace in the source is not the issue, however, it's an issue in your search expression. You need to eliminate all the whitespace in it and also, it's required that you prefix each class with a dot:

$html->find('.usertext-body.may-blank-within.md-container')

This expression is equivalent to "find all elements that have all of the classes". The search expressions are based on CSS selectors and their syntax for multiple classes is to chain all the classes together (see CSS selector for two classes).

Is whitespace in class name causing problem in Simple HTML DOM?

1 Answers1