1

I am trying crawl websites with scrapy python, most of sites done successfully but most of sites are giving tough time, because they are running on Nodejs and angularjs framework or other java frameworks, scrapy crawler is unable to get the details from the pages. please here i need your kind attention. looking forward to your earliest help.

here you can find the code that initially i am using for test base.

import scrapy
from selenium import webdriver
from scrapy.http import TextResponse

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ['https://en-ae.wadi.com/home_entertainment-televisions/?ref=navigation']

    def parse(self, response):
        self.log('i have just visited the ' + response.url)
        yield{
            'product_name'  : response.css('p.description.ng-binding > span::text').extract_first(),
        }

Thanks in advance.

Zia
  • 394
  • 1
  • 3
  • 13
  • You mean javascript framework. Given these sites have dynamic content, you will have to use dynamic web-scrapping technics. using *e.g.* [Selenium, (and why not) with scrapy](http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page) – keepAlive Apr 04 '17 at 13:27
  • You are only grabbing the HTML markup of the page - not actually executing Javascript. There are extensions for Scrapy, or pick a tool which can run the Javascript too. – samiles Apr 04 '17 at 13:37
  • i just tried but not reaching at the point, please can you give me some snippets or links for better help. Thanks – Zia Apr 04 '17 at 14:25

1 Answers1

4

Check out splash: that will allow you to crawl javascript based web sites.

You can also create your own downloader middleware and use Selenium: How to write customize Downloader Middleware for selenium and Scrapy?

Hope this helps.

Community
  • 1
  • 1
Adrien Blanquer
  • 2,041
  • 1
  • 19
  • 31