0

I'm using Scrapy and have set up:

  • Items
  • ItemLoader
  • ItemPipeline
  • start_requests using CSV to define multiple URLs
  • Custom Writer

I run the scraper simply:

scrapy crawl scraper

However, all output goes into one file.

I would like one output file per request/URL?

What am I missing?

Matt Sephton
  • 3,711
  • 4
  • 35
  • 46
  • 2
    You can set it up in your pipelines. Check out this answer: https://stackoverflow.com/a/47369516/3923463 – Ionut-Cezar Ciubotariu Jun 30 '19 at 10:58
  • Awesome! I should have thought to stray from the see standard place to write the file and move it inside the parse_item function – Matt Sephton Jun 30 '19 at 12:54
  • @Ionut-CezarCiubotariu sadly this approach will not work for me because I am processing my content in pipeline – Matt Sephton Jul 02 '19 at 17:57
  • Maybe I didn't understand your question, but in the answer I linked there is a pipeline class that is saving the items from the same category in a single file. Isn't this what you where looking for? – Ionut-Cezar Ciubotariu Jul 03 '19 at 10:56
  • Perhaps I did not understand the example clearly enough. It scrapes pages to extract links that it tags by category, title&date. Each link title&date is appended to a file in the correct category folder. I tried to modify this approach for my purpose but I ran into some problems, so I created a small bash script to loop through URLs from a file, and run the spider on each one. Not as neat a solution but it was quicker for me to get it to work. – Matt Sephton Jul 03 '19 at 12:06
  • Possible duplicate of [scrapy - seperate output file per starurl](https://stackoverflow.com/questions/47361396/scrapy-seperate-output-file-per-starurl) – Gallaecio Jul 05 '19 at 11:37
  • @Gallaecio not really a duplicate – Matt Sephton Jul 05 '19 at 16:20

0 Answers0