I'm aiming to scrape this URL.
Each item in the list links to more information about it. I aim to scrape all the 17000 linked pages. Only 10 results are shown and the view more button makes a request that adds, via JSON, 10 more results to the list. I've attempted to modify the request by changing batchsize, the parameter used to define the number of results in the list, but that didn't work. I've also attempted to use this code (from a tutorial), but couldn't adapt it to my specific task:
import json
import scrapy
class SpidyQuotesSpider(scrapy.Spider):
name = 'spidyquotes'
quotes_base_url = 'http://spidyquotes.herokuapp.com/api/quotes?page=%s'
start_urls = [quotes_base_url % 1]
download_delay = 1.5
def parse(self, response):
data = json.loads(response.body)
for item in data.get('quotes', []):
yield {
'text': item.get('text'),
'author': item.get('author', {}).get('name'),
'tags': item.get('tags'),
}
if data['has_next']:
next_page = data['page'] + 1
yield scrapy.Request(self.quotes_base_url % next_page)
I've looked at examples here, here and here. However, after 2 days of trying, I still cannot figure out how to solve this because the URL request on the site I wish to scrape differs from all the examples, and it seems they've made it more difficult to scrape...
The request made by hitting view more is the following:
The p= parameter increaes incrementally when hitting View more:
The returned JSON has the following format:
{"Heading":"17952 träffar på Alla mottagningar","Query":"","Region":null,"NextPage":3,"Page":2,"BatchSize":10,"BatchText":"Visa 10 till","TotalHits":17952,"SortOrder":"name","Latitude":0.0,"Longitude":0.0,"Bounds":null,"SearchHits":[{"HsaId":"SE162321000255-O23228","FriendlyUrl":"/hitta-vard/kontaktkort/A5-Psykoterapi-Katia-Karlsson-Carli-AB-Lund/","DisplayName":"A5 Psykoterapi Katia Karlsson Carli AB, Lund","Address":"Stortorget 1, Lund","PhoneNumber":"073-046 26 68","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":55.703161529482479,"Longitude":13.193039057187006},{"HsaId":"SE162321000255-O22542","FriendlyUrl":"/hitta-vard/kontaktkort/A5Psykoterapi-Gunilla-Lundqvist-Lund/","DisplayName":"A5Psykoterapi - Gunilla Lundqvist, Lund","Address":"Stortorget 1 5:e vån, Lund","PhoneNumber":"070-624 13 97","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":55.703161529482479,"Longitude":13.193039057187006},{"HsaId":"SE2321000057-6SV4","FriendlyUrl":"/hitta-vard/kontaktkort/A6-Ogonklinik-AB/","DisplayName":"A6 Ögonklinik AB","Address":"Batterigatan 9 NB, Jönköping","PhoneNumber":"036-860 20 30","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":57.768032303027383,"Longitude":14.202798620555548},{"HsaId":"SE162321000024-0059892","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Evelina-Linder-KBT/","DisplayName":"AB Evelina Linder KBT","Address":"Drottninggatan 1A, Uppsala","PhoneNumber":"073-593 00 73","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.858328320441558,"Longitude":17.638292776307694},{"HsaId":"SE162321000024-0052597","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Forsberg-KBT-konsult/","DisplayName":"AB Forsberg KBT-konsult","Address":"Trädgårdsgatan 5A, Uppsala","PhoneNumber":"070-818 17 11","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.856845411620185,"Longitude":17.635819529969204},{"HsaId":"SE2321000016-C7H4","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Lyhord-Ostermalmstorg/","DisplayName":"AB Lyhörd - Östermalmstorg","Address":"Östermalmstorg 1,STOCKHOLM","PhoneNumber":"08-425 004 00","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.336237708592563,"Longitude":18.079317099784653},{"HsaId":"SE2321000016-BH0B","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Suavis-horsel-Solna-Business-park/","DisplayName":"AB Suavis hörsel, Solna Business park","Address":"Svetsarvägen 15,2 tr,SOLNA","PhoneNumber":"010-207 11 77","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.35928477168008,"Longitude":17.980058512140353},{"HsaId":"SE2321000016-56DM","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Vackra-Tander-Annette-Goransson/","DisplayName":"AB Vackra Tänder Annette Göransson","Address":"Drottninggatan 71A,STOCKHOLM","PhoneNumber":"08-21 52 62","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33592153903674,"Longitude":18.059258535271329},{"HsaId":"SE5564844115-106Q","FriendlyUrl":"/hitta-vard/kontaktkort/AB-Vackra-Tander-Norrmalm/","DisplayName":"AB Vackra Tänder, Norrmalm","Address":"Drottninggatan 71 A, 3 tr,","PhoneNumber":"08-21 52 62","HasMvkServices":false,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33592396728109,"Longitude":18.059118082991937},{"HsaId":"SE2321000016-97P2","FriendlyUrl":"/hitta-vard/kontaktkort/ABA-Ogonklinik-i-Alvik/","DisplayName":"ABA Ögonklinik i Alvik","Address":"Tranebergsplan 3,,BROMMA","PhoneNumber":"08-124 440 10","HasMvkServices":true,"VaccinatesForFlu":false,"VaccinatesForHpv":false,"Distance":0.0,"Latitude":59.33516807973394,"Longitude":17.978288641135208}],"HasZeroHits":false}
I'd be grateful for some initial lines of code that would get me going.