python where can I find JOBDIR variable

Question

According to this question How Scrapy filters the crawled urls?, there is a file called requests.seen in the directory defined by the JOBDIR variable

Please where can I find the JOBDIR variable ?

@MattDMo I got undefined variable – Marco Dinatsoli Jan 27 '14 at 16:06 — Marco Dinatsoli, Jan 27 '14 at 16:06

score 2 · Accepted Answer · answered Jan 27 '14 at 16:07

2

According to official tutorial(Jobs: pausing and resuming crawls) JOBDIR can be set from command line:

scrapy crawl somespider -s JOBDIR=crawls/somespider-1

answered Jan 27 '14 at 16:07

ndpu

22,225
6
54
69

I run my spider, and Yes the file has been generated, but when I open it I didn't find the scraped URLs. Instead, I found lines like this `f6b696ffa8fbcd8fbd4eff777ba677091858a9c7` why please? – Marco Dinatsoli Jan 27 '14 at 16:16
is that the finger print of a scraped URL please? – Marco Dinatsoli Jan 27 '14 at 16:17
@MarcoDinatsoli in this directory scrapy will be storing all required data to keep the state of a single job (ie. a spider run), i.e. counters, offsets but not scraped lists of urls... – ndpu Jan 27 '14 at 16:28
what i am looking for, is the scraped a list of urls, where can i find it please? i have a sence that this file contains it – Marco Dinatsoli Jan 27 '14 at 16:34
@MarcoDinatsoli look here http://stackoverflow.com/questions/3871613/scrapy-how-to-identify-already-scraped-urls or similar questions – ndpu Jan 27 '14 at 18:17
How could I set the JOBDIR from a script? – William Kinaan Mar 02 '14 at 19:32

python where can I find JOBDIR variable

1 Answers1