Highest Voted 'scrapy-middleware' Questions

5

votes

2 answers

Scrapy FakeUserAgentError: Error occurred during getting browser

I use Scrapy FakeUserAgent and keep getting this error on my Linux Server. Traceback (most recent call last): File "/usr/local/lib64/python2.7/site-packages/twisted/internet/defer.py", line 1299, in _inlineCallbacks result = g.send(result) …

asked Mar 26 '17 at 01:02

Aminah Nuraini

18,120
8
90
108

3

votes

1 answer

Scrapy spider middleware

I have a function (check_duplicates()) in the spider that checks for the presence of a url in my database, and in case of absence passes the url on to the parse_product method: def check_duplicates(url): connection = mysql.connector.connect( …

scrapy scrapy-middleware

asked Jul 12 '22 at 05:02

m_sasha

239
1
7

2

votes

1 answer

Scrapy appears to be deduplicating the first request when it is processed with DownloaderMiddleware

I've got a certain spider which inherits from SitemapSpider. As expected, the first request on startup is to sitemap.xml of my website. However, for it to work correctly I need to add a header to all the requests, including the initial ones which…

python scrapy scrapy-middleware

asked Oct 16 '21 at 18:31

keddad

1,398
3
14
35

1

vote

0 answers

How to pause a scrapy spider and make the other keep on scraping?

I am facing a problem with my custom retry middleware in scrapy. I have a project made of 6 spiders, launched by a little script containing a CrawlerProcess(), crawling 6 different websites. They should work simultaneously and here is the problem: i…

scrapy scrapy-middleware

asked Mar 05 '21 at 20:01

Francesco Pieroni

11
2

1

vote

1 answer

Scrapy Middleware Selenium with meta

Basically, I have a working version of middleware to pass all requests through selenium and return HtmlResponse, the problem is I also want to have some meta data to be attached to the request which I can access in parse method of spider. For some…

web-scraping scrapy scrapy-middleware

asked Jun 08 '20 at 12:54

Data Operations

11
1

1

vote

2 answers

Trigger errback when process_exception() is called in Middleware

Using Scrapy i'm implementing a CrawlSpider which will scrape all kinds of websites and hence, sometimes very slow ones which will produce a timeout eventually. My problem is that if such a twisted.internet.error.TimeoutError occurs, i want to…

scrapy scrapy-middleware

asked Nov 13 '19 at 15:42

nichoio

6,289
4
26
33

1

vote

1 answer

Using regex in scrapy downolader middleware

I've been trying to make a custom middleware in Scrapy, which will flag urls containing certain patterns using regex. In short, there is a list of exceptions, and each url is checked against it. However, the middleware does not manage to properly…

python regex python-3.x scrapy scrapy-middleware

asked Nov 09 '18 at 07:31

T the shirt

79
12

1

vote

0 answers

Scrapy error catching in scrapy/middleware.py file: TypeError: init() missing 1 required positional argument: 'uri'

I am catching this error while starting a crawl. I have searched for an answer in several forums, and looked at the code in scrapy/middleware.py (came standard with scrapy and I have not altered it) and cannot figure out why I am getting an error.…

python-3.x scrapy scrapy-middleware

asked Dec 19 '17 at 01:18

Steve S

11
1

1

vote

0 answers

Scrapy doing retry after yield

I am new to python and scrapy, and now I am making a simply scrapy project for scraping posts from a forum. However, sometimes when crawling the post, it got a 200 but redirect to empty page (maybe because the instability server of the forum or…

python web-scraping scrapy scrapy-middleware

asked Sep 01 '17 at 10:20

Joe Leung

121
9

1

vote

0 answers

Scrapy - bug in custom DownloaderMiddleware

I have list of thousands of URL which I scrape using one Spider. Some URLs has the same domain. I want to count a number of Timeout errors per domain. If for domain x, is a number of Timeouts higher than LIMIT, I want to avoid scraping all URLs of…

python python-2.7 scrapy scrapy-middleware

asked May 22 '17 at 18:59

Milano

18,048
37
153
353

0

votes

2 answers

Retrying Downloader Middleware For Failed Requests in Scrapy

In scrapy I'm trying to write a downloader middleware which filters the responses with 401, 403,410 and sends these URLs some new requests. The error says that response_request must return a Response or a Request. Because I yield 10 requests to make…

python scrapy scrapy-middleware

asked Jul 19 '22 at 23:30

avakado0

101
1
9

0

votes

1 answer

How to build own middleware in Scrapy?

I'm just starting to learn Scrapy and I have such a question. for my "spider" I have to take a list of urls (start_urls) from the google sheets table and I have this code: import gspread from oauth2client.service_account import…

scrapy scrapy-middleware

asked Jun 22 '22 at 02:09

m_sasha

239
1
7

0

votes

1 answer

How can I read all logs at middleware?

I have about 100 spiders on a server. Every morning all spiders start scraping and writing all of the logs in their logs. Sometimes a couple of them gives me an error. When a spider gives me an error I have to go to the server and read from log file…

python scrapy scrapy-middleware

asked May 06 '21 at 18:03

Murat Demir

716
7
26

0

votes

0 answers

Too many 429 errors when the cache extension and the proxy middleware are enabled at the same time in scrapy

I am using scrapy to crawl data. The target website blocks the IP after it sends about 1000 requests. To deal with this, I wrote a proxy middleware, and because the amount of data is relatively large, I also wrote a cache extension. When I enabled…

scrapy scrapy-middleware

asked Feb 01 '21 at 10:06

Sherwin

11
1

0

votes

1 answer

How can I use scrapy middlewares to call a mail function?

I have 15 spiders and every spider has its own content to send mail. My spiders also have their own spider_closed method which starts the mail sender but all of them same. At some point, the spider count will be 100 and I don't want to use the same…

python web-scraping scrapy scrapy-middleware

asked Nov 26 '20 at 06:52

Murat Demir

716
7
26

Questions tagged [scrapy-middleware]