2

I am using scrapy script to load URL using "yield".

MyUrl = "www.example.com"
request = Request(MyUrl, callback=self.mydetail)
yield request
def mydetail(self, response):
    item['Description'] = response.xpath(".//table[@class='list']//text()").extract()
    return item

The URL seems to take minimum 5 seconds to load. So I want Scrapy to wait for some time to load the entire text in item['Description']. I tried "DOWNLOAD_DELAY" in settings.py but no use.

Prabhakar
  • 1,138
  • 2
  • 14
  • 30
  • Scrapy downloads the whole response before running your callback. That load time you notice on your browser may be additional things fetched/rendered via javascript which scrapy does not do on it's own. Try doing `scrapy shell ` to see that scrapy "sees" on the site. You need to check what else the page fetches and modify your code to match that or use a headless browser to render the page's javascript. (e.g. Splash, Selenium) – marven Feb 28 '15 at 02:47
  • I have used splash for rendering javascript. But the output is empty. I am not sure whether scrapy is rendering my javascript page – Prabhakar Mar 14 '15 at 08:33
  • Regardless of if you use splash, what @marven said holds true, Scrapy will wait for the whole response before proceeding. If you use Splash, than Splash becomes the new "webserver". From Scrapy's point-of-view, Splash is it's endpoint and will wait until Splash returns the entirety of the response. – Rejected Aug 25 '15 at 18:38
  • As is, you're callback is "self.mydetail", but the function is "jobdetail". Is this a typo? – Rejected Aug 25 '15 at 18:41

1 Answers1

-1

Make a brief view on firebug or another tool to capture responses for Ajax requests, which were made by javascript code. You are able to make a chain of responses to catch those ajax requests which appear after uploading of the page.There are several related questions: parse ajax content, retreive final page, parse dynamic content.

Community
  • 1
  • 1
yavalvas
  • 330
  • 2
  • 17