1

The query below is only searching for the first paragraph after the <h2> tag that contains "History" on a page of a website

            $paragraph = $domxpath->query('
                //h2[*[
                        contains(text(), "History")
                      ]
                    ]
                /following-sibling::p[
                        position() = 1 
                    ]'
            );

But I'd like somehow to check whether or not have any <h2> tag that contains history

foreach($paragraph as $node) {
                    $content= $node->nodeValue;                 
                }



                if(!isset($content)){
                    echo $content;
                }else{
                    echo "static content";
                }

this way it's not working

update

$html = file_get_contents( 'www.site.com' );    
                $document = new DOMDocument();              
                $document->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
                $domxpath = new DOMXPath($document);
                $paragraph = $domxpath->query('
                    //h2[*[
                            contains(text(), "History")
                          ]
                        ]
                    /following-sibling::p[
                            position() = 1 
                        ]'
                );
            }   


                foreach($paragraph as $node) {
                    $content= $node->nodeValue;

                }

                if(!isset($content)){
                    echo $content;
                }else{
                    echo "static content";
                }

but I do not know because when it does not have "history" it does not print the static content that is inside "else"

code html:

inside the div below has all the main content of the page

<div id="mw-content-text" lang="pt" dir="ltr" class="mw-content-ltr">

I would like to find the that has "History"

<h2><span id="Hist.C3.B3ria"></span><span class="mw-headline" id="History">History</span><span class="mw-editsection"><span class="mw-editsection-bracket">[</span><a href="/w/index.php?title=Adamantina&amp;veaction=edit&amp;section=1" class="mw-editsection-visualeditor" title="Editar secção: History">editar</a><span class="mw-editsection-divider"> | </span><a href="/w/index.php?title=Adamantina&amp;action=edit&amp;section=1" title="Editar secção: History">editar código-fonte</a><span class="mw-editsection-bracket">]</span></span></h2>

between the opening <h2> tag of closing </h2> has a lot of code as it is possible to see above

Gislef
  • 1,555
  • 3
  • 14
  • 37
  • 1
    The question is very unclear. Please provide an HTML example of what you have and what has to match. – ishegg Mar 04 '18 at 19:50
  • Thanks for the feedback, I'll improve this question. – Gislef Mar 04 '18 at 19:53
  • @ishegg I updated my question please, see if it is clearer – Gislef Mar 04 '18 at 20:00
  • It is, but it'd be much easier to help you if you post a excerpt of the HTML you're receiving and what you need to extract from it. – ishegg Mar 04 '18 at 20:01
  • @ishegg updated again, please see – Gislef Mar 04 '18 at 20:07
  • Is that the actual code? It's invalid HTML, you can't have the same `id` in more than one element. – ishegg Mar 04 '18 at 20:08
  • @ishegg first `id` is `id="Hist.C3.B3ria"` – Gislef Mar 04 '18 at 20:11
  • @Gislef I'm having a hard time understanding what you need. [Here](https://3v4l.org/Q3dOO) you can see a sample of an XPath query selecting span elements with "History" inside the text. Do you need something like that? – ishegg Mar 04 '18 at 20:17
  • @ishegg I'm trying to somehow check if the site page has some `

    ` that contains **History**, if it has history on the page, it only takes the first paragraph if it has no **History** in some `

    ` on the page it prints a static content

    – Gislef Mar 04 '18 at 20:22
  • `$paragraph = $domxpath->query('//h2/*[contains(text(), "History")]');` that will check for any "History" text inside `h2` – ishegg Mar 04 '18 at 20:23
  • Yes, but I'm not able to create the if and else check. Even if there is no `History` on the page it always executes `if` – Gislef Mar 04 '18 at 20:26
  • I see. Use `count($paragraph)` to get the number of results. `if (count($paragraph) > 0) { // History contained` – ishegg Mar 04 '18 at 20:27
  • `if (!isset($content)) { echo $content; }` doesn't make sense. – Syscall Mar 04 '18 at 20:29
  • @ishegg the result was the same – Gislef Mar 04 '18 at 20:34
  • @Gislef check my answer and demo. Does that put you in the right path? – ishegg Mar 04 '18 at 20:51

1 Answers1

1

Use this XPath query to get any h2 elements with the string "History" contained anywhere inside it:

//h2/*[contains(text(), "History")]

Then, to check if the result is positive, count the results. If it's higher than 0, there are results:

$paragraph = $domxpath->query('//h2/*[contains(text(), "History")]');
if ($paragraph->length > 0) {
    echo "Results!";
}
else {
    echo "Not contained";
}

Demo

ishegg
  • 9,685
  • 3
  • 16
  • 31
  • Thanks, but I think you have not understood my point yet. the content I want to print is just a paragraph. This paragraph will only be printed if there is "history" – Gislef Mar 04 '18 at 21:29
  • It's not clear what *paragraph* you are referring to. In your own xpath you are doing the same thing, that xpath *selects next sibling paragraph when there is a `h2` that contains "history"* @Gislef – revo Mar 04 '18 at 21:36
  • ok, imagine the following I have a huge list of pages if the page has histori this is printed in if if the page has no history prints else. – Gislef Mar 04 '18 at 21:38
  • xpath only filters do not compare – Gislef Mar 04 '18 at 21:39
  • Current answer does the same thing exactly. What things you have to compare to each other? @Gislef – revo Mar 04 '18 at 21:50
  • @Gislef, I'm really confused. Maybe this will help me understand: can you point exactly how the answer differs from what you are expecting? – ishegg Mar 04 '18 at 22:03
  • @ishegg I actually tried it in several ways I understand that the logic of your answer should work, but if the page has no history it should run the `else` but it does not work. I believe it is because it is an array object, I am now trying to check if this object is empty or not, because `count()` is not working the `else` – Gislef Mar 05 '18 at 07:44
  • I am creating automatic postings in wordpress, if external page (in foreach list of pages) contains "history" the posting is being created, but if it has no history the posting is not being created, but should be created with the static content that is in `else` – Gislef Mar 05 '18 at 07:54
  • You're totally right, sorry. I've updated the answer – ishegg Mar 05 '18 at 12:07
  • 1
    @Gislef glad to have helped. Good luck. – ishegg Mar 05 '18 at 20:31