I am newbie to scrapy and need to scrape some dataset for data mining project. I need to scrape "http://www.moneycontrol.com/india/stockpricequote/". Follow each link and extract data. I hve written a working scrapy crawler to get data using xpth and css.But i came across this element in page which uses javascript to use populate a tabbed table. xpath is same for each tab.So cant extract data for individual tab and get data stock gain percentage from each tab this is the tabbed element with gainpercentage in 5th row last column
I can scrape data from xpath and css but one part of page gets its from javascript. How can one scrape such data? Also i need data from each tab please tell me a way to do this as other answers use json and i am not familiar with it.
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class NewsItem(scrapy.Item):
name = scrapy.Field()
class StationDetailSpider(CrawlSpider):
name = 'test2'
start_urls = ["http://www.moneycontrol.com/india/stockpricequote/"]
rules = (
Rule(LinkExtractor(restrict_xpaths="//a[@class='bl_12']"), follow=False, callback='parse_news'),
Rule(LinkExtractor(allow=r"/diversified/.*$"), callback='parse_news')
)
def parse_news(self, response):
item = NewsItem()
NEWS1_SELECTOR = 'div#disp_nse_hist tr:nth-child(5) > td:nth-child(4)::text'
TIME1_SELECTOR = 'div#disp_nse_hist tr:nth-child(5) > td:nth-child(4)::text'
NAME_SELECTOR = 'div#disp_nse_hist tr:nth-child(5) > td:nth-child(4)::text'
print("------------------------------------starting extraction------------")
item['name']=response.css(NAME_SELECTOR).extract_first()
item['time1']=response.css(TIME1_SELECTOR).extract_first()
item['news1']=response.css(NEWS1_SELECTOR).extract()
return item