How i can Parse Mediawiki Sommaire and found the HTML code with PHP?

Question

Exemple with a mediawiki link : https://www.visionduweb.eu/wiki/index.php?title=Utiliser_PHP

Show the source code and identify the sommaire from this Mediawiki page.

I search how i can parse the source code and found the HTML code for this sommaire.

#

I tried with $domExemple = $xpath->query(« //ul/li »); but I have too many answers and poorly formatted.

I tried with $domExemple = $xpath->query(« //ul/li[@class=’toclevel-1 tocsection-1′] »); which gives me the result, but, how to get all toclevel and tocsection, without having to specify the number 1, or 2, or 3, ... toclevel or tocsection.

In this example, I do not get the HTML content, only the text content. I would have preferred to retrieve the HTML content.

This question would be easier to read with the samples in code blocks. ps: What specific part of the html are you trying to parse/extract? — zanlok, Jul 03 '18 at 23:05

score 0 · Answer 1 · answered Jul 03 '18 at 23:27

I believe you can simplify your xpath expression using the syntax defined here: How can I match on an attribute that contains a certain string?

Try something like this:

$results = $xpath->query('//ul/li[contains(@class, "toclevel-") and contains(@class, "tocsection-"]');
foreach ($results as $li) {
    // to get html of $li, import it into a fresh DOMDocument and run saveHTML
    $newdoc = new DOMDocument();
    $cloned = $li->cloneNode(true);
    $newdoc->appendChild($newdoc->importNode($cloned, true));
    echo $newdoc->saveHTML();
}

How i can Parse Mediawiki Sommaire and found the HTML code with PHP?

#

1 Answers1