-2

I am learning scrapy and am trying to scrape this realtor site in Quebec. I am using their API to collect homes and print the URLs to the screen. But my last function print_urls() won't run. I really am stuck here i tried debugging it and it just skips right over my whole function block.

class CentrishomesSpider(scrapy.Spider):
    name = 'centrisHomes'
    # allowed_domains = ['www.centris.ca']
    # start_urls = ['http://www.centris.ca/']

    def start_requests(self):
        query = {...
        }

        yield scrapy.Request(
            url='https://www.centris.ca/property/UpdateQuery',
            method='POST',
            body=json.dumps(query),
            headers={
                'Content-Type': 'application/json',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
            },
            callback=self.get_inscriptions
        )
        ...

    def get_inscriptions(self, response):
        resp, success = self.success(response)
        if success == True:
            print(Fore.GREEN + 'Query Updated' + Style.RESET_ALL)
        else:
            print(Fore.RED + 'Query Not Updated' + Style.RESET_ALL)

        yield scrapy.Request(
            url='https://www.centris.ca/Property/GetInscriptions',
            method='POST',
            body=json.dumps({"startPosition": 0}),
            headers={
                'Content-Type': 'application/json',
                'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36'
            },
            callback=self.handle_inscriptions
        )

    def handle_inscriptions(self, response):
        homes, success = self.success(response)
        if success == True:
            print(Fore.GREEN + 'Count ' + str(homes['d']['Result']['count']) + Style.RESET_ALL)
        # self.test()
        self.html = Selector(text=homes['d']['Result']['html'])
        self.print_urls()
        # print(response.body)
        ...

    def success(self, response):
        my_dict = literal_eval(response.body.decode(
            'utf-8').replace(':true}', ':True}'))
        if my_dict['d']['Succeeded'] == True:
            return my_dict, True
        else:
            return False

    def print_urls(self):
        print('try')
        # page_html = Selector(resp['d']['Result']['html'])
        page_html = self.html
        homes = page_html.xpath('//div[contains(@class, "property-thumbnail-item")]')
        for home in homes:
            yield{
            'home_url':home.xpath('.//a[@class="property-thumbnail-summary-link"]/@href').get()
            }
        ...
  • In your own words, what part of the code should cause it to run? Why? [Did you try](https://meta.stackoverflow.com/questions/261592/) to [trace the execution](https://stackoverflow.com/questions/25385173) of the code and [understand exactly what is happening](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/)? "I really am stuck here i tried debugging it and it just skips right over my whole function block." **What exactly does this mean**? Concretely and exactly, what do you think should happen differently, when and why? – Karl Knechtel Nov 10 '22 at 23:52
  • Did you check whether any exceptions are getting swallowed somewhere else? Did you consider what will happen if `success` returns a `False` value, and this is blindly unpacked into two variables? – Karl Knechtel Nov 10 '22 at 23:55
  • It should yield the url for each individual home to the console – Carter James Nov 10 '22 at 23:57
  • It failed to run the code in that function for some reason due to my yield statement when i replaced it with a print statement it worked and the function was run but why is that? @KarlKnechtel – Carter James Nov 11 '22 at 00:00
  • Its just a scrapy spider this is reproducible just paste it in a spider file @Alexander – Carter James Nov 11 '22 at 03:20
  • Does this answer your question? [Python Scrapy - Yield statement not working as expected](https://stackoverflow.com/questions/34605063/python-scrapy-yield-statement-not-working-as-expected) – Alexander Nov 11 '22 at 03:29

1 Answers1

0

I figured out my own problem, it was because I turned my print_urls function into a generator, and calling self.print_urls() doesn't make my generator do anything. S/o @AbdealiJK I figured it out because of his answer.

https://stackoverflow.com/a/34609397/19966841