retrieve the contents of a div from a external site

Question

Try to retrieve the contents of a div from the external site withg PHP, and XPath

This is an excerpt from the page, showing the relevant code: note: i try to add all - also to add @ on the class and a at the end on my query, After that, i use saveHTML() to get it. see my test:

btw:

this is my XPath:  //*[@id="post-15991"]/div[4]/div[1]
this is the URL: https://wordpress.org/plugins/wp-job-manager/

see the subsequent code:

<?PHP
$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTMLFile($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;
?>

output: But the output is zero....

background:

my way to get the xpath; use google chrome: I have a webpage I want to get some data off:

https://wordpress.org/plugins/wp-job-manager/
https://wordpress.org/plugins/participants-database/
https://wordpress.org/plugins/amazon-link/
https://wordpress.org/plugins/simple-membership/
https://wordpress.org/plugins/scrapeazon/

goal: i need the following data:

Version:
Last updated:
Active installations:
Tested up

see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/

Version: 1.29.3

Last updated: 5 days ago

Active installations: 100,000+

                    <li>
        Requires WordPress Version:<strong>4.3.1</strong>                </li>

                    <li>Tested up to: <strong>4.9.2</strong></li>

background: i need the data from all my favorite plugins - want to have it in a db or a calc sheet. So there were approx 70 pages to scrape:_

see here the list for the example - the full xpath:

//*[@id="post-15991"]/div[4]/div[1]

and job-board-manager:

//*[@id="post-519"]/div[4]/div[1]/ul/li[1]
//*[@id="post-519"]/div[4]/div[1]/ul/li[2]
//*[@id="post-519"]/div[4]/div[1]/ul/li[3]
//*[@id="post-519"]/div[4]/div[1]/ul/li[7]

i used this method: Is there a way to get the xpath in google chrome?

Right click "inspect" on the item you are trying to find the xpath
Right click on the highlighted area on the console.
Go to Copy xpath

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

You are calling .loadHTMLFile which is expecting a file path. If you have your warning options on, you will see the following warnings:

E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Attribute class redefined in https://wordpress.org/plugins/wp-job-manager/, line: 190 -- at line 5

E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag header invalid in https://wordpress.org/plugins/wp-job-manager/, line: 201 -- at line 5

E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag nav invalid in https://wordpress.org/plugins/wp-job-manager/, line: 205 -- at line 5

E_WARNING : type 2 -- DOMDocument::loadHTMLFile(): Tag main invalid in https://wordpress.org/plugins/wp-job-manager/, line: 224 -- at line 5

Instead, use .loadHTML.

$url = 'https://wordpress.org/plugins/wp-job-manager/';
$dom = new DOMDocument();
@$dom->loadHTML($url);
$xpath = new DOMXpath($dom);
$elements = $xpath->query('//*[@id="post-15991"]/div[4]/div[1]');
$link = $dom->saveHTML($elements->item(0));
echo $link;

And the result would be:

https://wordpress.org/plugins/wp-job-manager/

hello and good day - many thanks - see the orginal post and the ** goal that i have** : i want to have the data : `Last updated: Active installations: Tested up` see for example the following - view-source:https://wordpress.org/plugins/wp-job-manager/ ` Version: 1.29.3 Last updated: 5 days ago Active installations: 100,000+` How to retrieve those results? — zero, Feb 13 '18 at 00:09

retrieve the contents of a div from a external site

1 Answers1