0

I am writing a spider with scrapy in python3, and l just started scrapy not a long time. I was catching the data of a web-site and after some minutes, web site maybe get me the 302 status and redirect to another url to verify me. So l want to save the url to the file.

for example, https://www.test.com/article?id=123 is what I want to request, and then it response me 302 an redirect to https://www.test.com/vrcode

I want to save https://www.test.com/article?id=123 to file, how should I do?

class CatchData(scrapy.Spider):
    name = 'test'

    allowed_domains = ['test.com']

    start_urls = ['test.com/article?id=1',
                  'test.com/article?id=2',
                  # ...
                 ]

    def parse(self, response):
        item = LocationItem()
        item['article'] = response.xpath('...')
        yield item

I found a answer from How to get the scrapy failure URLs?

but It is an answer at six years ago, I want to know is there more simple way to do this

afraid.jpg
  • 965
  • 2
  • 14
  • 31
  • Is https://www.test.com/vrcode an error page in the HTTP sense? What’s the HTTP status code of that response? – Gallaecio Jun 05 '19 at 07:53
  • @Gallaecio status code is `302` – afraid.jpg Jun 05 '19 at 08:01
  • 1
    A 302 response always redirects to another response. I’m asking about the status code of the final response, the target of the redirect, not the redirect itself. If you locate https://www.test.com/vrcode in the logs, you should be able to see the corresponding status code. I ask because I suspect it may be 200, which means that the linked answer would not work for you. In any case, it conditions what you need to do. – Gallaecio Jun 05 '19 at 08:04
  • log info from terminal seem like this `DEBUG: Crawled (302) (referer: None)`, only this one – afraid.jpg Jun 05 '19 at 08:09
  • What does the 2 lines about `https://www.test.com/vrcode` look like? – Gallaecio Jun 05 '19 at 08:12
  • also `DEBUG: Crawled (302) (referer: None)` – afraid.jpg Jun 05 '19 at 08:14
  • the terminal has been closed, so l cannot take a screenshot about it, But it sometimes `(302) ` and sometimes `(302) ` – afraid.jpg Jun 05 '19 at 08:17
  • See https://stackoverflow.com/a/35817478/939364 – Gallaecio Jun 05 '19 at 08:18

1 Answers1

0
with open(file_name, 'w', encoding="utf-8") as f:
    f.write(str(item))
colidyre
  • 4,170
  • 12
  • 37
  • 53
ZBay
  • 352
  • 1
  • 6
  • 17