Questions tagged [goutte]

Goutte is a simple headless web browser, written in PHP.

Goutte is a simple headless web browser / web scraper, written in PHP.

It can be used for writing automated testing scripts for websites.

It is a thin wrapper around a number of existing Symphony classes and components, including BrowserKit, DomCrawler, and others.

Full source code can be found here: https://github.com/fabpot/Goutte

308 questions
31
votes
1 answer

Behat & Mink : Use the test environment

I'm current using Behat with Mink & Goutte Driver. When i'm trying to use it with my dev environment, via the app_dev.php file, which is a typical app_dev.php file from a Symfony2 Standard Edition, my tests are working just fine (Gists). But, if I…
Talus
  • 754
  • 7
  • 18
12
votes
2 answers

How to use Goutte

Issue: Cannot fully understand the Goutte web scraper. Request: Can someone please help me understand or provide code to help me better understand how to use Goutte the web scraper? I have read over the README.md. I am looking for more information…
scrfix
  • 1,188
  • 3
  • 11
  • 24
10
votes
3 answers

How to use proxy authentication with Goutte?

I have the following code but it always returns a 407 HTTP status code. $url = 'http://whatismyip.org'; $client = new Client(); $options = array( 'proxy' => array( 'http' => 'tcp://@x.x.x.x:8010', ), 'auth' =>…
Abs
  • 56,052
  • 101
  • 275
  • 409
9
votes
3 answers

How to crawl with php Goutte and Guzzle if data is loaded by Javascript?

Many times when crawling we run into problems where content that is rendered on the page is generated with Javascript and therefore scrapy is unable to crawl for it (eg. ajax requests, jQuery)
Batman
  • 91
  • 1
  • 1
  • 5
8
votes
3 answers

Goutte - Get inner values from $crawler->filter()

I am using PHP 7.1.33 and "fabpot/goutte": "^3.2". My composer file looks like the following: { "name": "ubuntu/workspace", "require": { "fabpot/goutte": "^3.2" }, "authors": [ { "name": "admin", …
Carol.Kar
  • 4,581
  • 36
  • 131
  • 264
6
votes
1 answer

How to get meta description content using Goutte

Can you please help me to find a way to get a content from meta description, meta keywords and robots content using Goutte. Also, how can I target and