0

I was following a tutorial to scrape multiple pages from a website using scrapy library. The tutorial used yield statement to get information from the html and css structure of the page using css selectors and xpath selectors. I decide to use an if statement to check whether a search query finds a result and an else statement to output what to do when a search query did not encounter a result. The problem arises when the code executes the else statement which extracts the company's name and for the Location and Sales field I want a customized output string that conveys 'Not Found'.

When I run the script I get the following error:

File "C:\Users\....\hoover-scraper\scraper.py", line 28

'Location': 'Not Found'
         ^

I think this is not a proper way to use the yield statement that's why I am getting the SyntaxError message. Thus I was wondering if there was any way to output the strings 'Not found' for sales and location fields for when a query encounters an empty search.

This part of my code:

def parse(self, response):
    NAME_SELECTOR ="td a::text"
    LOCATION_SELECTOR ='.//tr/td/text()' #using xpath to grab information for Location and Sales
    SALES_SELECTOR = './/tr/td/text()' 

if response.css(NAME_SELECTOR).extract_first(): #Checks to see if the company name field has data if not prints 'No results found'
        yield {

            'Company Name': response.css(NAME_SELECTOR).extract_first(),
            'Location' : response.xpath(LOCATION_SELECTOR)[0].extract(), #Location comes first inside the td tags thus the [0]
            'Sales' : response.xpath(SALES_SELECTOR)[1].extract(),
        }

    else:
        yield {
            'Company Name': response.css("dd.value.term::text").extract_first() #identifies company name which data was not found
            'Location': 'Not Found'
            'Sales': 'Not Found'
        }
Simon
  • 241
  • 2
  • 13

1 Answers1

2

yield is used only in generators. Do you just want to return that value from your method? Then replace yield with return in both places.

If you need to use the value later in the same method, assign the dictionary to a variable. Like

if response.css(NAME_SELECTOR).extract_first(): #Checks to see if the company name field has data if not prints 'No results found'
        result = {

            'Company Name': response.css(NAME_SELECTOR).extract_first(),
            'Location' : response.xpath(LOCATION_SELECTOR)[0].extract(), #Location comes first inside the td tags thus the [0]
            'Sales' : response.xpath(SALES_SELECTOR)[1].extract(),
        }

    else:
        result = {
            'Company Name': response.css("dd.value.term::text").extract_first(), #identifies company name which data was not found
            'Location': 'Not Found',
            'Sales': 'Not Found'
        }
    # do something with result
    ...
    # or just:
    return result
  • Yes I need to use these values later, so converting them to a dictionary works for me. I just added coma after each dictionary entry in the else statement and it worked. – Simon Dec 27 '17 at 15:39