Highest Voted 'scrapy-splash' Questions

27

votes

3 answers

Scrapy Shell and Scrapy Splash

We've been using scrapy-splash middleware to pass the scraped HTML source through the Splash javascript engine running inside a docker container. If we want to use Splash in the spider, we configure several required project settings and yield a…

asked Feb 11 '16 at 23:56

alecxe

462,703
120
1,088
1,195

16

votes

3 answers

Adding a wait-for-element while performing a SplashRequest in python Scrapy

I am trying to scrape a few dynamic websites using Splash for Scrapy in python. However, I see that Splash fails to wait for the complete page to load in certain cases. A brute force way to tackle this problem was to add a large wait time (eg. 5…

python scrapy wait scrapy-splash splash-js-render

asked Dec 10 '16 at 11:58

NightFury13

761
7
19

14

votes

1 answer

Does using scrapy-splash significantly affect scraping speed?

So far, I have been using just scrapy and writing custom classes to deal with websites using ajax. But if I were to use scrapy-splash, which from what I understand, scrapes the rendered html after javascript, will the speed of my crawler be affected…

python selenium web-scraping scrapy scrapy-splash

asked Apr 18 '18 at 05:17

hsy

165
1
1
8

10

votes

1 answer

Scrapy Splash Screenshots?

I'm trying to scrape a site whilst taking a screenshot of every page. So far, I have managed to piece together the following code: import json import base64 import scrapy from scrapy_splash import SplashRequest class ExtractSpider(scrapy.Spider): …

python lua scrapy scrapy-splash

asked Jul 18 '17 at 16:18

Exam Orph

365
9
18

10

votes

1 answer

How to set splash timeout in scrapy-splash?

I use scrapy-splash to crawl web page, and run splash service on docker. commond: docker run -p 8050:8050 scrapinghub/splash --max-timeout 3600 But I got a 504 error. "error": {"info": {"timeout": 30}, "description": "Timeout exceeded rendering…

python scrapy scrapy-splash splash-js-render

asked Jun 19 '17 at 10:08

Jhon Smith

181
2
12

9

votes

3 answers

Scrapy CrawlSpider + Splash: how to follow links through linkextractor?

I have the following code that is partially working, class ThreadSpider(CrawlSpider): name = 'thread' allowed_domains = ['bbs.example.com'] start_urls = ['http://bbs.example.com/diy'] rules = ( Rule(LinkExtractor( …

python scrapy web-crawler scrapy-splash splash-js-render

asked Aug 25 '17 at 16:45

eN_Joy

853
3
11
20

8

votes

2 answers

SplashRequest gives - TypeError: attrs() got an unexpected keyword argument 'eq'

I am using a cloud Splash instance from ScrapingHub. I am trying to do a simple request using the Scrapy-Splash library and I keep getting the error: @attr.s(hash=False, repr=False, eq=False) TypeError: attrs() got an unexpected keyword argument…

python scrapy-splash

asked May 20 '20 at 03:31

Ankur

50,282
110
242
312

8

votes

3 answers

how does scrapy-splash handle infinite scrolling?

I want to reverse engineering the contents generated by scrolling down in the webpage. The problem is in the url https://www.crowdfunder.com/user/following_page/80159?user_id=80159&limit=0&per_page=20&screwrand=933. screwrand doesn't seem to follow…

scrapy scrapy-splash splash-js-render

asked Oct 30 '16 at 02:56

Bowen Liu

99
2
7

7

votes

2 answers

How to load local HTML file in Scrapy Splash?

I want to load a local HTML file using Scrapy Splash and take save it as PNG/JPEG and then delete the HTML file script = """ splash:go(args.url) return splash:png() """ resp = requests.post('http://localhost:8050/run', json={ 'lua_source':…

scrapy scrapy-splash

asked Apr 23 '20 at 12:09

Umair Ayub

19,358
14
72
146

7

votes

1 answer

How can I use Scrapy-Splash without Docker?

Is a way to use scrapy splash without docker. I mean, I have a server running with python3 without docker installed. And If possible I don't want to install docker on it. Also, what does exactly SPLASH_URL. Can I use only the IP of my server ? I…

python-3.x scrapy scrapy-splash

asked Jul 26 '19 at 08:57

DjibsDaBrasil

71
2

7

votes

1 answer

scrapy-splash returns its own headers and not the original headers from the site

I use scrapy-splash to build my spider. Now what I need is to maintain the session, so I use the scrapy.downloadermiddlewares.cookies.CookiesMiddleware and it handles the set-cookie header. I know it handles the set-cookie header because i set…

python scrapy scrapy-splash splash-js-render

asked Sep 25 '16 at 12:57

Roman Smelyansky

319
1
13

6

votes

2 answers

Connection was refused by other side: 10061: No connection could be made because the target machine actively refused it

My steps: Build image docker build . -t scrapy Run a container docker run -it -p 8050:8050 --rm scrapy In container run scrapy project: scrapy crawl foobar -o allobjects.json This works locally, but on my production server I get…

python docker scrapy scrapy-splash windows-server-2019

asked Sep 15 '21 at 18:29

Adam

6,041
36
120
208

6

votes

1 answer

Scrapy-Splash ERROR 400: "description": "Required argument is missing: url"

I'm using scrapy splash in my code to generate javascript-html codes. And splash is giving me back this render.html { "error": 400, "type": "BadOption", "description": "Incorrect HTTP API arguments", "info": { "type":…

python error-handling scrapy scrapy-splash

asked Dec 30 '19 at 02:10

Reynaldo Chester Jr. de Alday

123
1
9

6

votes

1 answer

How to send custom headers in a Scrapy Splash request?

My spider.py file is as so: def start_requests(self): for url in self.start_urls: yield scrapy.Request( url, self.parse, headers={'My-Custom-Header':'Custom-Header-Content'}, meta={ …

python scrapy scrapy-splash splash-js-render

asked May 14 '19 at 11:36

Nadun Perera

565
6
15

6

votes

1 answer

Form Request Using Scrapy + Splash

I am trying to login to a website using the following code (slightly modified for this post): import scrapy from scrapy_splash import SplashRequest from scrapy.crawler import CrawlerProcess class Login_me(scrapy.Spider): name = 'espn' …

python python-3.x scrapy scrapy-splash splash-js-render

asked Dec 14 '18 at 22:56

J. Dykstra

201
1
10

Questions tagged [scrapy-splash]