Queue to allow Scrapy Analysis

Question

I'm pretty much a scrapy (and python) newbie and I'm trying to figure out how to use scrapy and analyze the data from it all in one script. The program creates a json and I want to open that json after the program has finished scraping.

What happens is that the program tries to open the json before scrapy has finished because the actual os command has been executed. I believe the answer may be to use threads and queues but I am having trouble figuring out how these are implemented. Here is my code:

def crawl():
    os.system("scrapy crawl average_cur -o currency.json -t json")

    inFile=open('currency.json')
    data=json.load(inFile)
    print data


crawl()

that's weird, are you sure that it's trying to load the file before scrapy is finished? it doesn't make much sense, since `os.system` waits for it to finish. you can try using `subprocess.check_output` instead. — Elias Dorneles, May 30 '15 at 16:43

score 0 · Answer 1 · edited May 23 '17 at 12:30

0

I think scrapy signals will do the trick, this link might help or you could write and extension and from there you can do whatever operation you want to perform.

In the extension,the spider_closed() will help you to complete your task.

def spider_closed(self, spider):
   # perform your task here

edited May 23 '17 at 12:30

Community

1
1

answered May 29 '15 at 03:55

Jithin

1,692
17
25

Queue to allow Scrapy Analysis

1 Answers1