Questions tagged [zyte]

13 questions
2
votes
2 answers

Why isn't Puppeteer page.click waiting (maybe Browserless?)

Goal: I have a page that I need to get html from after first clicking something on the page. Issue: The html that comes back is not waiting for that element click. Here's one way that I've tried to do it. await page.setViewport({width: 1400, height:…
dizzy
  • 1,177
  • 2
  • 12
  • 34
1
vote
1 answer

sending request through proxy. request library works, axios does not

I am trying to update some old code to get rid of the request package since it is no longer maintained. I attempted to replace a proxy request with axios, but it doesn't work (I just get a timeout). Am I missing an axios config somewhere? The…
Rilcon42
  • 9,584
  • 18
  • 83
  • 167
1
vote
2 answers

Requests fail with 504: Gateway Time-out when using scrapy-splash in docker compose with zyte

I'm trying to scrape one site which partially renders content using JS. I went ahead and found this project: https://github.com/scrapinghub/sample-projects/tree/master/splash_smart_proxy_manager_example, which quite neatly explains how to set things…
Odif Yltsaeb
  • 5,575
  • 12
  • 49
  • 80
1
vote
0 answers

I get the error "ImportError: libtk8.6.so: cannot open shared object file: No such file or directory" while deploying my python app to Zyte

I searched this question on internet and the most of the solutions suggest me to install tkinter. Tkinter has been installed but the error still persists. Please someone guide me on this
Rija Shaheed
  • 69
  • 2
  • 7
0
votes
1 answer

scrapy spider working locally but resulting in 403 error when running on Zyte

The spider is setup in a way where it reads the links to scrape and finally, makes a post request, and the data is parsed. The spider is able to collect data locally, but when deployed to ZYTE it results in the error shown below.. ``` …
0
votes
1 answer

I'm having issue while deploying scrapper to Zyte formerly (Scraping hub)

My spider has to read some data from input.csv file. It runs fine locally. But when I try to deploy it on Zyte by shub deploy it does not includes input.csv in build. So when I try to run it on the server it produces following error. Traceback (most…
0
votes
1 answer

How to save Scrapy Broad Crawl Results?

Scrapy has a built-in way of persisting results in AWS S3 using the FEEDS setting. but for a broad crawl over different domains this would create a single file, where the results from all domains are saved. how could I save the results of each…
NightOwl
  • 1,069
  • 3
  • 13
  • 23
0
votes
1 answer

Why error with installing csv when its part of python core package in scrapinghub

I have 3 spiders defined. All the related requirements are mentioned in requirements.txt scrapy pandas pytest requests google-auth functions-framework shub msgpack-python Also, the scrapinghub.yml defined to use scrapy 2.5 project:…
Avirup Das
  • 189
  • 1
  • 3
  • 15
0
votes
1 answer

401 Client Error: Unauthorized for url: https://storage.scrapinghub.com/collections

When I run a spider in Scrapy Cloud Projects I get this error: 401 Client Error: Unauthorized for url: https://storage.scrapinghub.com/collections/569447/s/casti Do you have any idea why? Logs Error Log
0
votes
2 answers

Scrapinghub scrapy: ModuleNotFoundError: No module named 'pandas'

I have tried deploying to Zyte via command line and GitHub but I have been stuck with the above error. I have tried different versions of Scrapy version 1.5 to 2.5 but the error still persists. I have also tried setting my Scrapinghub.yml to the…
chuky pedro
  • 756
  • 1
  • 8
  • 26
0
votes
1 answer

Scrapinghub/Zyte: Unhandled error in Deferred: No module named 'scrapy_user_agents'

I'm deploying my Scrapy spider via my local machine to Zyte Cloud (former ScrapingHub). This is successful. When I run the spider I get the output below. I already checked here. The Zyte team is not very responsive on their own site it seems, but…
Adam
  • 6,041
  • 36
  • 120
  • 208
-1
votes
1 answer

Is it possible to create a proxy failover with Python Scrapy?

Is it possible to create a proxy failover within Scrapy, so that when one fails the other will take over scraping the rest of the requests? I would of thought that it would be done using the retry middleware, but I don't really have a clue how to…
-1
votes
1 answer

How can I add a new spider arg to my own template in Scrapy/Zyte

I am working on a paid proxy spider template and would like the ability to pass in a new argument on the command line for a Scrapy crawler. How can I do that?
Gregory Williams
  • 453
  • 5
  • 18