0

I am going to scrape html contents on http://ntry.com/#/scores/named_ladder/main.php with Scrapy.

But, because of the site's Javascript use and # , I guess I have to use Selenium (Python) also.

I'd like to write my own code, but I am new to programming so I guess I need help;

I want to enter ntry.com first, and move to http://ntry.com/#/scores/named_ladder/main.php by clicking an anchor called

<body>
    <div id="wrap">
        <div id="container">
            <div id="content">
                <a href="/scores/named_ladder/main.php">사다리</a>
            </div>
        </div>
    </div>
</body>

and then I want to scrape htmls on the changed page using Scrapy.

How can I make a selenium-blended Scrapy spider?

Andersson
  • 51,635
  • 17
  • 77
  • 129
heyzude
  • 363
  • 1
  • 4
  • 20
  • Possible duplicate of [selenium with scrapy for dynamic page](http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page) – dashdashzako Nov 26 '16 at 10:49

1 Answers1

0

I installed Selenium and then loaded PhantomJS module and it worked perfectly.

Here is what you can try

from selenium import webdriver 
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

class FormSpider(Spider):
    name = "form"

    def __init__(self):

        dcap = dict(DesiredCapabilities.PHANTOMJS)
        dcap["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36")

        self.driver = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true', '--ssl-protocol=any', '--web-security=false'])
        self.driver.set_window_size(1366,768)


    def parse_page(self, response):
            self.driver.get(response.url)
            cookies_list = self.driver.get_cookies()
Umair Ayub
  • 19,358
  • 14
  • 72
  • 146