I was following a tutorial to scrape multiple pages from a website using scrapy library. The tutorial used yield statement to get information from the html and css structure of the page using css selectors and xpath selectors. I decide to use an if statement to check whether a search query finds a result and an else statement to output what to do when a search query did not encounter a result. The problem arises when the code executes the else statement which extracts the company's name and for the Location and Sales field I want a customized output string that conveys 'Not Found'.
When I run the script I get the following error:
File "C:\Users\....\hoover-scraper\scraper.py", line 28
'Location': 'Not Found'
^
I think this is not a proper way to use the yield statement that's why I am getting the SyntaxError message. Thus I was wondering if there was any way to output the strings 'Not found' for sales and location fields for when a query encounters an empty search.
This part of my code:
def parse(self, response):
NAME_SELECTOR ="td a::text"
LOCATION_SELECTOR ='.//tr/td/text()' #using xpath to grab information for Location and Sales
SALES_SELECTOR = './/tr/td/text()'
if response.css(NAME_SELECTOR).extract_first(): #Checks to see if the company name field has data if not prints 'No results found'
yield {
'Company Name': response.css(NAME_SELECTOR).extract_first(),
'Location' : response.xpath(LOCATION_SELECTOR)[0].extract(), #Location comes first inside the td tags thus the [0]
'Sales' : response.xpath(SALES_SELECTOR)[1].extract(),
}
else:
yield {
'Company Name': response.css("dd.value.term::text").extract_first() #identifies company name which data was not found
'Location': 'Not Found'
'Sales': 'Not Found'
}