9

The official tutorial specifies the way on how to call scrapy within python script

By changing the following setting attributes:

settings.overrides['FEED_URI'] = output_path
settings.overrides['FEED_FORMAT'] = 'json'

I am able to store the data scraped in a json file.

However, I'm trying to process and return the data scraped immediately within the function I defined. Hence, other functions can call this wrapper function in order to scrap some websites.

I figure there must be some settings I can play with FEED_URI, but I am not sure. Any advice will be appreciated deeply!

alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
user1819047
  • 667
  • 9
  • 18
  • 2
    Create a pipeline? See [this answer](http://stackoverflow.com/a/27744766/771848) (unofficial tutorial). – alecxe Apr 01 '15 at 22:17

1 Answers1

2

Feed exports are meant to serialize the data you've scraped (see feed export documentation). What you are trying to do doesn't involve serialization.

What you want to do instead is create a pipeline. Scrapy will pass scraped Items to the pipeline. They are dictionaries, and you can do whatever you want with them.