0

I want to scrape this:

<a class="pdt_title"> 
  Japan Sun Apple - Fuji
  <span class="pdt_Tweight">2 per pack</span>
</a>

This is my code:

use Goutte\Client;
$client = new Client();
$crawler = $client->request('GET', 'https://www.fairprice.com.sg/searchterm/apple');
foreach ($crawler->filter('a.pdt_title') as $node) {
    print $node->nodeValue."\n";
}

I only want to scrape the text inside "a" tag without the text inside "span" tag. How to only get the text inside "a" tag?

Krisnadi
  • 641
  • 1
  • 10
  • 23

1 Answers1

0

Looking at the HTML markup, the text node that you want falls into the first child of the anchor. Since each $node is an instance of DOMElement, you can use ->firstChild (targeting the text node), then use ->nodeValue:

foreach ($crawler->filter('a.pdt_title') as $node) {
    echo $node->firstChild->nodeValue . "\n";
}

Another alternative is to use xpath, via ->filterXpath(), its in the docs by the way:

foreach ($crawler->filterXpath('//a[@class="pdt_title"]/text()') as $text) {
    echo $text->nodeValue , "\n";
}

Related docs:

https://symfony.com/doc/current/components/dom_crawler.html

The xpath query just targets the anchor with that class and then the text.

Or another one liner. It returns an array, extracting the texts:

$output = $crawler->filterXpath('//a[@class="pdt_title"]/text()')->extract(array('_text'));

Related DOM Docs:

http://php.net/manual/en/class.domelement.php
http://php.net/manual/en/class.domnode.php

Kevin
  • 41,694
  • 12
  • 53
  • 70