Web Scraping Error - ERROR for site owner: Invalid domain for site key

Question

I tried to get contents of this URL - https://www.zillow.com/homedetails/131-Avenida-Dr-Berkeley-CA-94708/24844204_zpid/ I used scrapy. Here is my code.

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'https://www.zillow.com/homedetails/131-Avenida-Dr-Berkeley-CA-94708/24844204_zpid/',
    ]
    def parse(self, response):
        filename = 'test.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

I opened scraped data(test.html) and I got this content. I tried to find solutions and I tried this - ERROR for site owner: Invalid domain for site key But it didn't solve my issue.

score 0 · Accepted Answer · answered Feb 12 '20 at 04:59

First of all, try this approach and see if this works:

Headerz = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "accept-encoding": "gzip, deflate, br",
    "accept-language": "en-US,en;q=0.9",
    "cache-control": "no-cache",
    "content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "pragma": "no-cache",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "cross-site",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
}

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'https://www.zillow.com/homedetails/131-Avenida-Dr-Berkeley-CA-94708/24844204_zpid/',
    ]

    def start_requests(self):
        yield scrapy.Request(start_urls[0], callback=self.parse, headers=Headerz)

    def parse(self, response):
        filename = 'test.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

The reason why we don't see output as we see it in normal browser is that we don't use proper headers that are otherwise always sent by the browser.

You need to add headers either as stated in above code or by updating them in the settings.py.

A better approach would be to use 'rotating-proxies' respositories along with 'rotating-user-agent' repository.

Thank you @Janib Soomro. Your solution solved my issue. Awesome! — , Feb 12 '20 at 19:57

Web Scraping Error - ERROR for site owner: Invalid domain for site key

1 Answers1