How to get the urls those are with error status

Question

I am writing a spider with scrapy in python3, and l just started scrapy not a long time. I was catching the data of a web-site and after some minutes, web site maybe get me the 302 status and redirect to another url to verify me. So l want to save the url to the file.

for example, https://www.test.com/article?id=123 is what I want to request, and then it response me 302 an redirect to https://www.test.com/vrcode

I want to save https://www.test.com/article?id=123 to file, how should I do?

class CatchData(scrapy.Spider):
    name = 'test'

    allowed_domains = ['test.com']

    start_urls = ['test.com/article?id=1',
                  'test.com/article?id=2',
                  # ...
                 ]

    def parse(self, response):
        item = LocationItem()
        item['article'] = response.xpath('...')
        yield item

I found a answer from How to get the scrapy failure URLs?

but It is an answer at six years ago, I want to know is there more simple way to do this

Is https://www.test.com/vrcode an error page in the HTTP sense? What’s the HTTP status code of that response? — Gallaecio, Jun 05 '19 at 07:53
A 302 response always redirects to another response. I’m asking about the status code of the final response, the target of the redirect, not the redirect itself. If you locate https://www.test.com/vrcode in the logs, you should be able to see the corresponding status code. I ask because I suspect it may be 200, which means that the linked answer would not work for you. In any case, it conditions what you need to do. — Gallaecio, Jun 05 '19 at 08:04
log info from terminal seem like this `DEBUG: Crawled (302) (referer: None)`, only this one — afraid.jpg, Jun 05 '19 at 08:09
What does the 2 lines about `https://www.test.com/vrcode` look like? — Gallaecio, Jun 05 '19 at 08:12
the terminal has been closed, so l cannot take a screenshot about it, But it sometimes `(302) ` and sometimes `(302) ` — afraid.jpg, Jun 05 '19 at 08:17

score 0 · Answer 1 · edited Feb 22 '20 at 00:21

0

with open(file_name, 'w', encoding="utf-8") as f:
    f.write(str(item))

edited Feb 22 '20 at 00:21

colidyre

4,170
12
37
53

answered Feb 21 '20 at 20:56

ZBay

352
1
6
17

How to get the urls those are with error status

1 Answers1