I am trying to make a tool that should get every link from website. For example I need to get all questions pages from stackoverflow. I tried using scrapy.
class MySpider(CrawlSpider):
name = 'myspider'
start_urls = ['https://stackoverflow.com/questions/']
def parse(self, response):
le = LinkExtractor()
for link in le.extract_links(response):
url_lnk = link.url
print (url_lnk)
Here I got only questions from start page. What I need to do to get all 'question' links. Time doesn't matter, I just need to understand what to do.
UPD
The site which I want to observe is https://sevastopol.su/ - this is a local city news website.
The list of all news should be containde here: https://sevastopol.su/all-news
In the bottom of this page you can see page numbers, but if we go to the last page of news we will see that it has number 765 (right now, 19.06.2019) but it shows the last new with a date of 19 June 2018. So the last page shows only the one-year old news. But there are also plenty of news links that are still alive (probably from 2010 year) and can be even found in search page of this site. So that is why I wanted to know if there can be an access to some global link store of this site.