I'm interested to extract just the XHR's urls, not every url in a webpage:
Thats my code to extract every url in a page:
import scrapy
import json
from scrapy.selector import HtmlXPathSelector
from scrapy.spiders import CrawlSpider, Rule, Spider
from scrapy.linkextractors import LinkExtractor
class test(CrawlSpider):
name = 'test'
start_urls = ['SomeURL']
filename = 'test.txt'
rules = (
Rule(LinkExtractor(allow=('', )) ,callback='parse_item'),
)
def parse_item(self, response):
# hxs = HtmlXPathSelector(response)
with open ('test.txt', 'a') as f:
f.write (response.url + '\n' )
Thanks,
Edited: Hi, thanks for the comments. After more research I came across this : Scraping ajax pages using python What I want is to do this answer automatically. I need to to this for a big amount of web pages and manually inserting the urls isn't an option. Is there a way to do that? Listen to the XHR requests of a site and save the urls?