Highest Voted 'scrapinghub' Questions

11

votes

1 answer

Not able Running/deploying custom script with shub-image

I have problem for Running/deploying custom script with shub-image. setup.py from setuptools import setup, find_packages setup( name = 'EU-Crawler', version = '1.0', packages = find_packages(), scripts = [ …

python scrapy scrapinghub

asked Dec 07 '17 at 16:24

parik

2,313
12
39
67

8

votes

4 answers

scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()

I am trying to programatically call a spider through a script. I an unable to override the settings through the constructor using CrawlerProcess. Let me illustrate this with the default spider for scraping quotes from the official scrapy site (last…

python-3.x web-scraping scrapy scrapinghub

asked Feb 28 '17 at 14:48

hAcKnRoCk

1,118
3
16
30

6

votes

1 answer

Scrapy hidden memory leak

Background - TLDR: I have a memory leak in my project Spent a few days looking through the memory leak docs with scrapy and can't find the problem. I'm developing a medium size scrapy project, ~40k requests per day. I am hosting this using…

python memory scrapy scrapinghub

asked Sep 17 '20 at 11:08

Hector Haffenden

1,360
10
25

6

votes

0 answers

Pygsheets unable to find the server at www.googleapis.com

I'm trying to use pygsheets in a script on ScrapingHub. The pygsheets part of the script begins with: google_client = pygsheets.authorize(service_file=CREDENTIALS_FILENAME, no_cache=True) spreadsheet = google_client.open_by_key(SHEET_ID) Where…

python google-sheets google-sheets-api scrapinghub pygsheets

asked Jan 30 '18 at 18:03

osjerick

626
2
8
20

5

votes

1 answer

Scrapy does not fetch markup on response.css

I've built a simple scrapy spider running on scrapinghub: class ExtractionSpider(scrapy.Spider): name = "extraction" allowed_domains = ['domain'] start_urls = ['http://somedomainstart'] user_agent = "Mozilla/5.0 (Windows NT 10.0;…

python web-scraping scrapy scrapinghub splash-js-render

asked Aug 27 '19 at 15:37

qubits

1,227
3
20
50

4

votes

1 answer

scrapy how to load urls from file at scrapinghub

I know how to load data into Scrapy spider from external source when working localy. But I strugle to find any info on how to deploy this file to scrapinghub and what path to use there. Now i use this approach from SH documentation - enter link…

scrapy scrapinghub

asked Aug 09 '17 at 09:01

Billy Jhon

1,035
15
30

4

votes

2 answers

Download project's source-code from Scrapinghub

I have a project deployed on Scrapinghub, I do not have any copy of that code at all. How can I download the whole project's code on my localhost from Scrapinghub?

python scrapy scrapinghub

asked Jul 27 '17 at 16:17

Umair Ayub

19,358
14
72
146

3

votes

1 answer

Splash - Scrapy - HAR data

In general I understand how to work with Scrapy and x-path to parse the html. However, I don't know how to grab the HAR data. mport scrapy from scrapy_splash import SplashRequest class QuotesSpider(scrapy.Spider): name = 'quotes' …

python scrapy scrapy-splash scrapinghub splash-js-render

asked Jan 17 '20 at 13:23

Zach

421
1
5
11

3

votes

1 answer

Why is scrapy with crawlera running so slow?

I am using scrapy 1.7.3 with crawlera (C100 plan from scrapinghub) and python 3.6. When running the spider with crawlera enabled I get about 20 - 40 items per minute. Without crawlera I get 750 - 1000 (but I get banned quickly of course). Have I…

python scrapy scrapinghub crawlera

asked Aug 03 '19 at 17:29

Wramana

183
1
4
16

3

votes

1 answer

Use splash from scrapinghub scraping hub locally

I got a subscription for splash on scrapinghub and I want to use this from a script that is running on my local machine. The instructions I have found so far are: Edit the settings file: #I got this one from my scraping hub account SPLASH_URL =…

python scrapy scrapy-splash scrapinghub splash-js-render

asked Jul 13 '19 at 22:57

Luis Ramon Ramirez Rodriguez

9,591
27
102
181

3

votes

1 answer

ScrapingHub Environment Variables Not Loaded

I'm deploying a bunch of spiders on ScrapingHub. The spider itself is working. I would like to change the feed output depending on whether the spider is running locally or on ScrapingHub (if it is running locally then output to a temp folder, if it…

python amazon-s3 scrapy scrapinghub

asked Jun 18 '19 at 03:41

Ze Xuan

56
6

3

votes

1 answer

scrapinghub starting job too slow

I am new in scraping and I am running different jobs on scrapinghub. I run them via their API. The problem is that starting the spider and initializing it takes too much time like 30 seconds. When I run it locally, it takes up to 5 seconds to finish…

scrapy scrapinghub

asked May 22 '19 at 05:30

Mara M

153
1
1
10

3

votes

2 answers

Scrapy and Splash times out for a specific site

I have an issue with Scrapy, Crawlera and Splash when trying the fetch responses from this site. I tried the following without luck: pure Scrapy shell - times out Scrapy + Crawlera - times out Scrapinghub Splash instance (small) - times…

web-scraping scrapy scrapy-splash scrapinghub splash-js-render

asked Jan 18 '18 at 13:11

Szabolcs

3,990
18
38

3

votes

2 answers

How to install xvfb on Scrapinghub for using Selenium?

I use Python-Selenium in my spider (Scrapy), for using Selenium i should install xvfb on Scrapinghub. when i use apt-get for installing xvfb i have this error message: E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) …

selenium selenium-webdriver scrapy xvfb scrapinghub

asked Jun 09 '17 at 15:17

parik

2,313
12
39
67

2

votes

1 answer

Scrapy crawlera authentication issue

I've been trying to use scrapy-crawlera as a proxy for scraping some data with scrapy. I've added these rows in settings.py: DOWNLOADER_MIDDLEWARES = { 'scrapy_crawlera.CrawleraMiddleware': 610, } CRAWLERA_ENABLED = True CRAWLERA_APIKEY =…

python web-scraping scrapy scrapinghub crawlera

asked Mar 09 '21 at 09:12

memeister

53
5

Questions tagged [scrapinghub]