How to Scrape JSON Data Using Scrapy

Question

I'm using scrapy and I'm trying to test my selector using scrapy shell but nothing is working. I'm trying to scrape the JSON data on this website.

https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true

I've tried to scrape the data using the selector

   response.css("body > pre::text").extract()

However, this doesn't seem to be working. Not sure what's wrong...

Ideally, I just want to get all the "Name: XXX" elements from the JSON data. So If you know how to select those specifically, that would be very helpful as well!

Currently my code looks like this

    # -*- coding: utf-8 -*-
    import scrapy # needed to scrape
    import sys    # need to import xlrd
    sys.path.extend("/Users/YoungFreeesh/anaconda3/lib/python3.6/site- 
    packages/") # needed to import xlrd
    import xlrd   # used to easily import xlsx file 

    class AmazonbotSpider(scrapy.Spider):
        name = 'ArchiveSpider'

        allowed_domains = ['web.archive.org']
        start_urls =['https://web.archive.org/web/20180604230058/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true']

        def parse(self, response):
            print(response.body)

Re: "this doesn't seem to be working" — not sure anyone is a mind reader here. I could be wrong though... — l'L'l, Jun 11 '18 at 20:16
I checked the networks log and it loads the json file from this url https://web.archive.org/web/20180604230058if_/https://api.simon.com/v1.2/tenant?mallId=231&key=40A6F8C3-3678-410D-86A5-BAEE2804C8F2&lw=true .. Difference between both urls is 'if_'. See if this pattern matches with other links you have. You can use this hack to get your data. — sP_, Jun 11 '18 at 20:19

score 1 · Accepted Answer · answered Jun 11 '18 at 20:18

1

Since the content is inside an iframe, it is a separate page, you have to navigate to the iframe first. Like a link, something like that:

urls = response.css('iframe::attr(src)').extract()
for url in urls :
    yield scrapy.Request(url...., target=parse_iframe)

then define a new parse_iframe method where you parse the iframes response.

answered Jun 11 '18 at 20:18

nosklo

217,122
57
293
297

Here is a similar question: https://stackoverflow.com/questions/52779161/python-scrapy-json-xpath-how-to-scrape-json-data-with-scrapy/52779299#52779299 Could you please answer? – Debbie Oct 12 '18 at 13:06

How to Scrape JSON Data Using Scrapy

1 Answers1

Linked