1

I want to get content from a site where i want to output some specific data. The data there looks like this:

<a itemprop="email">office@xy.com</a>

From this type of data i want to output only the email adress.

This is the Code:

<?php
$homepage = file_get_contents('https://www.xy.com/');
echo $homepage;
?>
LovinQuaQua
  • 111
  • 2
  • 12
  • I would use [SimpleXML](http://us3.php.net/manual/en/book.simplexml.php), probably with [XPath](http://us3.php.net/manual/en/simplexmlelement.xpath.php) to extract this value. – Alex Howansky Jun 29 '18 at 17:50

1 Answers1

0

You should use a parser. This will be more accurate than a regex, or string functions.

$dom = new domdocument();
$dom->loadhtml('<a itemprop="email">office@xy.com</a>');
$xpath = new DOMXPath($dom);
echo $xpath->query('//a[@itemprop="email"]')[0]->nodeValue;

https://3v4l.org/BU7Q4

You can read more here.

  1. http://php.net/manual/en/class.domdocument.php
  2. https://en.wikipedia.org/wiki/XPath

An alternative to using the xpath could be select all links then looking for the attribute.

$dom = new domdocument();
$dom->loadhtml('<a itemprop="email">office@xy.com</a>');
$links = $dom->getElementsByTagName('a');
foreach($links as $link) {
    if($link->getAttribute('itemprop') == 'email') {
        echo $link->nodeValue;
    }
}
user3783243
  • 5,368
  • 5
  • 22
  • 41
  • the page where i get this data has a pagination - is it possible to add this to the code? – LovinQuaQua Jun 29 '18 at 18:08
  • Ummm maybe? I don't know how the `pagination` relates to an email address – user3783243 Jun 29 '18 at 18:12
  • i read out more email adresses with this code - and i want to go further the pagination – LovinQuaQua Jun 29 '18 at 18:15
  • @LovinQuaQua If the pagination is done with Javascript, you need to emulate that. See https://stackoverflow.com/questions/199045/is-there-a-php-equivalent-of-perls-wwwmechanize – Barmar Jun 29 '18 at 18:17
  • @LovinQuaQua If you are trying to get content from different pages the pagination needs to go on the `file_get_contents` call. `$homepage = file_get_contents('https://www.xy.com/?page=2');` then you will have the contents of page 2 and can again use this code (assuming the markup is the same) – user3783243 Jun 29 '18 at 18:19
  • @user3783243 the markup is the same - how can i modify the code so file_get_contens will read all pages ( for example i have 60 pages to read out)? – LovinQuaQua Jun 29 '18 at 18:22
  • How do you know the value is `60`? Where is the pagination stored? Maybe search the page for `Next` that is a link to the same page with a `GET` parameter? – user3783243 Jun 29 '18 at 18:29
  • i could store the number of pages in a variable for example and the run all the pages ( in this case i have 60 pages ) i would enter the page numbers manual... Pagination looks like this: &Page=34 – LovinQuaQua Jun 29 '18 at 18:32
  • `for($i = 0; $i < 60; $i++) { $homepage = file_get_contents('https://www.xy.com/?page=' . $i); //parse here... }` I'd recommend a dynamic approach though. – user3783243 Jun 29 '18 at 18:35