0

I need to grab a commodity website's review data, but it's user data is paged .The comments per page are 10 strips , and there are about 100 pages. How can I crawl all of them out?

My intention is to use the yield and Request method to crawl the "Next Page" link, and then using the Xpath to extract data. But I can't jump to the next page to extract the data. 

Here is the Html code about the "Next Page" link:

<div class="xs-pagebar clearfix">
     <div class="Pagecon">
          <div class="Pagenum">
               <a class="pre-page pre-disable">
               <a class="pre-page pre-disable">
               <span class="curpage">1</span>
               <a href="#" onclick="tosubmits(2):return false;">2</a>
               <a href="#" onclick="tosubmits(3);return false;">3</a>
               <span class="elli">...</span>
               <a href="#" class="next-page" onclick="tosubmits('2');return false;">Next Page</a>
               <a href="#" onclick="tosubmits('94');return false;">Final Page</a>
           </div>
     </div>
</div>

What does href="#" exactly mean?

reVerse
  • 35,075
  • 22
  • 89
  • 84
samlong
  • 63
  • 1
  • 5

1 Answers1

0

Unfortunately you will not be able to do this with scrapy. href="#" is an anchor link that just links nowhere (to make this look like a link). What really happens is the javascript onclick handler that is executed. You will need to have a method of executing the javascript to do this for your use case. You might want to look into Splinter to do this.

k-nut
  • 3,447
  • 2
  • 18
  • 28
  • Thank you for your explanation. As for that, do you know any other method to complete the work? I have been stacked by this for several days. – samlong Nov 06 '14 at 14:29
  • As i said you can either use splinter or look into the chrome dev tools to see what the JavaScript is calling: http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax – k-nut Nov 06 '14 at 14:44
  • Thank you very much! By using Splinter, I've solved the problem! Splinter is a powerful tool to solve the dynamic web pages' problems, I like it very much! – samlong Nov 09 '14 at 12:37