I'm trying to scrape Startup-india website

Question

I know it's not a error but I'm not understanding how to scrape startup india website, I'm trying to click some website given in startup India but I can't click on them because scrapy can't click on websites and whatever information I'm having can only be scraped by pressing that link.

import scrapy
from selenium import webdriver
import os

class ProductSpider(scrapy.Spider):
    name = "product_spider"
    allowed_domains = ['https://www.startupindia.gov.in/']
    start_urls = ['https://www.startupindia.gov.in/content/sih/en/search.html?industries=sih:industry/advertising&states=sih:location/india/andhra-pradesh&stages=Prototype&roles=Startup&page=0']

    def __init__(self):
        cwd = os.getcwd()
        self.driver = webdriver.Chrome("C:/Users/RAJ/PycharmProjects/WebCrawler/WebCrawler/WebCrawler/spiders/chromedriver.exe")
        self.profile = []

    def parse(self, response):
        self.driver.get(response.url)

        while True:
            next = self.driver.find_element_by_xpath('//*[@id="persona-results"]/div[1]/div/a/div[1]')

            try:
                next.click()

                # get the data and write it to scrapy items
            except:
                break

        self.driver.close()

By the way, my end goal is to go get all the profile details but I don't know-how (PS: This is the first time me doing web scraping)

If only showing me the code to I guess click on website would be great. — Siddharth Pilli, Jan 16 '20 at 07:50
See https://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page — , Jan 16 '20 at 07:57
You cannot click by Scrapy itself, instead you can fetch the HREF somehow. Put the link and your try in your question. — Moein Kameli, Jan 16 '20 at 09:47

wywy_ds6699 · Answer 1 · 2020-01-16T10:19:53.480

This sounds like something similar to the tutorial on Scrapy documentation as below. In general, you can try to refer to the #follow links to author pages , right click and inspect the place to "click" to get the css/xpath on your desired webpage.

https://docs.scrapy.org/en/latest/intro/tutorial.html

Alternatively, feel free to share what you have. Hope this helps!

import scrapy


    class AuthorSpider(scrapy.Spider):
        name = 'author'

        start_urls = ['http://quotes.toscrape.com/']

        def parse(self, response):
            # follow links to author pages
            for href in response.css('.author + a::attr(href)'):
                yield response.follow(href, self.parse_author)

            # follow pagination links
            for href in response.css('li.next a::attr(href)'):
                yield response.follow(href, self.parse)

        def parse_author(self, response):
            def extract_with_css(query):
                return response.css(query).get(default='').strip()

            yield {
                'name': extract_with_css('h3.author-title::text'),
                'birthdate': extract_with_css('.author-born-date::text'),
                'bio': extract_with_css('.author-description::text'),
            }

I'm trying to scrape Startup-india website

1 Answers1