a web scraping development and services company, supplies cloud-based web crawling platforms.
Questions tagged [scrapinghub]
179 questions
11
votes
1 answer
Not able Running/deploying custom script with shub-image
I have problem for Running/deploying custom script with shub-image.
setup.py
from setuptools import setup, find_packages
setup(
name = 'EU-Crawler',
version = '1.0',
packages = find_packages(),
scripts = [
…

parik
- 2,313
- 12
- 39
- 67
8
votes
4 answers
scrapy passing custom_settings to spider from script using CrawlerProcess.crawl()
I am trying to programatically call a spider through a script. I an unable to override the settings through the constructor using CrawlerProcess. Let me illustrate this with the default spider for scraping quotes from the official scrapy site (last…

hAcKnRoCk
- 1,118
- 3
- 16
- 30
6
votes
1 answer
Scrapy hidden memory leak
Background - TLDR: I have a memory leak in my project
Spent a few days looking through the memory leak docs with scrapy and can't find the problem.
I'm developing a medium size scrapy project, ~40k requests per day.
I am hosting this using…

Hector Haffenden
- 1,360
- 10
- 25
6
votes
0 answers
Pygsheets unable to find the server at www.googleapis.com
I'm trying to use pygsheets in a script on ScrapingHub. The pygsheets part of the script begins with:
google_client = pygsheets.authorize(service_file=CREDENTIALS_FILENAME, no_cache=True)
spreadsheet = google_client.open_by_key(SHEET_ID)
Where…

osjerick
- 626
- 2
- 8
- 20
5
votes
1 answer
Scrapy does not fetch markup on response.css
I've built a simple scrapy spider running on scrapinghub:
class ExtractionSpider(scrapy.Spider):
name = "extraction"
allowed_domains = ['domain']
start_urls = ['http://somedomainstart']
user_agent = "Mozilla/5.0 (Windows NT 10.0;…

qubits
- 1,227
- 3
- 20
- 50
4
votes
1 answer
scrapy how to load urls from file at scrapinghub
I know how to load data into Scrapy spider from external source when working localy. But I strugle to find any info on how to deploy this file to scrapinghub and what path to use there. Now i use this approach from SH documentation - enter link…

Billy Jhon
- 1,035
- 15
- 30
4
votes
2 answers
Download project's source-code from Scrapinghub
I have a project deployed on Scrapinghub, I do not have any copy of that code at all.
How can I download the whole project's code on my localhost from Scrapinghub?

Umair Ayub
- 19,358
- 14
- 72
- 146
3
votes
1 answer
Splash - Scrapy - HAR data
In general I understand how to work with Scrapy and x-path to parse the html. However, I don't know how to grab the HAR data.
mport scrapy
from scrapy_splash import SplashRequest
class QuotesSpider(scrapy.Spider):
name = 'quotes'
…

Zach
- 421
- 1
- 5
- 11
3
votes
1 answer
Why is scrapy with crawlera running so slow?
I am using scrapy 1.7.3 with crawlera (C100 plan from scrapinghub) and python 3.6.
When running the spider with crawlera enabled I get about 20 - 40 items per minute. Without crawlera I get 750 - 1000 (but I get banned quickly of course).
Have I…

Wramana
- 183
- 1
- 4
- 16
3
votes
1 answer
Use splash from scrapinghub scraping hub locally
I got a subscription for splash on scrapinghub and I want to use this from a script that is running on my local machine. The instructions I have found so far are:
Edit the settings file:
#I got this one from my scraping hub account
SPLASH_URL =…

Luis Ramon Ramirez Rodriguez
- 9,591
- 27
- 102
- 181
3
votes
1 answer
ScrapingHub Environment Variables Not Loaded
I'm deploying a bunch of spiders on ScrapingHub. The spider itself is working. I would like to change the feed output depending on whether the spider is running locally or on ScrapingHub (if it is running locally then output to a temp folder, if it…

Ze Xuan
- 56
- 6
3
votes
1 answer
scrapinghub starting job too slow
I am new in scraping and I am running different jobs on scrapinghub. I run them via their API. The problem is that starting the spider and initializing it takes too much time like 30 seconds. When I run it locally, it takes up to 5 seconds to finish…

Mara M
- 153
- 1
- 1
- 10
3
votes
2 answers
Scrapy and Splash times out for a specific site
I have an issue with Scrapy, Crawlera and Splash when trying the fetch responses from this site.
I tried the following without luck:
pure Scrapy shell - times out
Scrapy + Crawlera - times out
Scrapinghub Splash instance (small) - times…

Szabolcs
- 3,990
- 18
- 38
3
votes
2 answers
How to install xvfb on Scrapinghub for using Selenium?
I use Python-Selenium in my spider (Scrapy), for using Selenium i should install xvfb on Scrapinghub.
when i use apt-get for installing xvfb i have this error message:
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied) …

parik
- 2,313
- 12
- 39
- 67
2
votes
1 answer
Scrapy crawlera authentication issue
I've been trying to use scrapy-crawlera as a proxy for scraping some data with scrapy. I've added these rows in settings.py:
DOWNLOADER_MIDDLEWARES = { 'scrapy_crawlera.CrawleraMiddleware': 610, }
CRAWLERA_ENABLED = True
CRAWLERA_APIKEY =…

memeister
- 53
- 5