1

How can I scrape all tools from site with dynamic routing

http://growthtools.io/social-media-automation-tools

When I was trying to

scrapy shell 'http://growthtools.io/social-media-automation-tools' 

I recieved following result

2017-01-07 22:43:06 [root] DEBUG: Using default logger
2017-01-07 22:43:06 [root] DEBUG: Using default logger

In [1]: view(response)

enter image description here

and response object did't contain tools elements.

In [3]: In [2]: response.css('.toolsList')
Out[3]: []
In [5]: 'toolsList' in response.body
Out[5]: False

Who can describe how can I parse http://growthtools.io/social-media-automation-tools and why reponse object did't contain all page content?

Danil
  • 4,781
  • 1
  • 35
  • 50

1 Answers1

0

The page load involves JavaScript executed by the browser which Scrapy is not. You can though solve it with scrapy-splash which provides a middleware to use in your Scrapy project. The middleware uses the Splash JS rendering service which you can run through the docker.

As far as testing it in the Scrapy Shell, you can follow this example to run it from the shell.

Works for me:

$ scrapy shell 'http://localhost:8050/render.html?url=http://growthtools.io/social-media-automation-tools' 
In [1]: response.css('.toolsList')
Out[1]: 
[<Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>,
 <Selector xpath=u"descendant-or-self::*[@class and contains(concat(' ', normalize-space(@class), ' '), ' toolsList ')]" data=u'<div class="col-md-10 col-xs-12 toolsLis'>]
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195