1

I am trying to change scraping of dynamic website using selenium phantomjs to scrapyjs. But problem is if we write a click event in splash, it will need a yield request to work. If we give a yield request, it will render the first page. So we don't see the click event changes in source code. ie, no need to re-render the web page. It is possible in selenium. Is there any same feature available in splash?

1 Answers1

0

Got a solution to use lua variable.We can pass variable through splash meta args. Example:

    v = 1
    yield scrapy.Request(url, meta={'splash': {'endpoint':   'execute','args': {'lua_source': script,'indx':v}},'v':v } , callback=self.parseVariationDetailPage , dont_filter=True)

we can get the value of indx that we passed through args by "splash.args.indx".

Following function show the element click.

script = """
function main(splash)
     splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js")
     z = splash.args.indx
     assert(splash:go(splash.args.url))
     assert(splash:wait(1))
     assert(splash:runjs("$('#listChipColor li[z]').click()"))
     assert(splash:wait(1))
     return splash:html()
end """

===================== Old answer below =======================

I can't see a solution without rendering the page with scrapyjs click event.

Following is the sample code and its working.I can't get a solution for writing lua variable in js. So here use a simple logic to get the click element.

scrapyjs click

script = """
    function main(splash)
         splash:autoload("https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js")
         assert(splash:go(splash.args.url))
         assert(splash:runjs("k = window.location.href"))
         assert(splash:runjs("l = k.length"))
         assert(splash:wait(1))
         assert(splash:runjs("k =  k.charAt(l - 1)"))
         assert(splash:runjs('document.querySelectorAll("ul.colour-swatches-list > li")[k].click();'))
         assert(splash:wait(1))
         return splash:html()
end """

Request

url = url+"vl="+'%s'%v
yield scrapy.Request(url, self.parseVariationPage,meta={
  'splash': {
        'args': {'lua_source': script},'endpoint': 'execute'},
        'url':url,'type':             response.meta['type'],'category':response.meta['category'],'fit':response. meta['fit'],'v':v
})