I found some answers for a topic of how to extract all available links from any website and all of them were about scrapy module. ALso copied one of the code example:
from scrapy import Spider
from scrapy.linkextractors import LinkExtractor
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://webpage.com']
def parse(self, response):
le = LinkExtractor()
for link in le.extract_links(response):
print (link)
But I need to launch it and get a simple python list of all html pages to get some information from them using urllib2
and bs4
.
How to launch this class correctly to get this list?