I am trying to change scraping of dynamic website using selenium phantomjs to scrapyjs. But problem is if we write a click event in splash, it will need a yield request to work. If we give a yield request, it will render the first page. So we don't see the click event changes in source code. ie, no need to re-render the web page. It is possible in selenium. Is there any same feature available in splash?
Asked
Active
Viewed 975 times
1 Answers
0
Got a solution to use lua variable.We can pass variable through splash meta args. Example:
v = 1
yield scrapy.Request(url, meta={'splash': {'endpoint': 'execute','args': {'lua_source': script,'indx':v}},'v':v } , callback=self.parseVariationDetailPage , dont_filter=True)
we can get the value of indx that we passed through args by "splash.args.indx".
Following function show the element click.
script = """
function main(splash)
splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js")
z = splash.args.indx
assert(splash:go(splash.args.url))
assert(splash:wait(1))
assert(splash:runjs("$('#listChipColor li[z]').click()"))
assert(splash:wait(1))
return splash:html()
end """
===================== Old answer below =======================
I can't see a solution without rendering the page with scrapyjs click event.
Following is the sample code and its working.I can't get a solution for writing lua variable in js. So here use a simple logic to get the click element.
scrapyjs click
script = """
function main(splash)
splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js")
assert(splash:go(splash.args.url))
assert(splash:runjs("k = window.location.href"))
assert(splash:runjs("l = k.length"))
assert(splash:wait(1))
assert(splash:runjs("k = k.charAt(l - 1)"))
assert(splash:runjs('document.querySelectorAll("ul.colour-swatches-list > li")[k].click();'))
assert(splash:wait(1))
return splash:html()
end """
Request
url = url+"vl="+'%s'%v
yield scrapy.Request(url, self.parseVariationPage,meta={
'splash': {
'args': {'lua_source': script},'endpoint': 'execute'},
'url':url,'type': response.meta['type'],'category':response.meta['category'],'fit':response. meta['fit'],'v':v
})

Anoop Ambujan
- 46
- 5
-
Hi Paul, i have corrected the code indentation. – Anoop Ambujan Apr 19 '16 at 12:36