-1

I wrote a code for amazon web scraping, My code is working fine for all amazon products but it does stick for some particular products which I don't know, The error that it displays for some products is UnboundLocalError: local variable 'reviews' referenced before assignment could you please guide me how can I fix this issue, Thanks! here is code:

import requests 
from bs4 import BeautifulSoup as bs 
import json 
import random 
import os.path
import time
import pandas as pd

def scrape_products(response):

    dataframe = pd.DataFrame()
    res = pd.DataFrame()

    # Checking if response is okay or not
    if response.ok:
        response = response.text
        content = bs(response,'lxml')

        # Selecting products
        items = content.find_all('li' , class_ = 'zg-item-immersion')

        for item in items:            
            # Selecting data about each product
            count = item.find('span' , class_ = 'zg-badge-text').text.strip()
            title = item.find('div' , class_ = 'p13n-sc-truncate').text.strip()
            price = item.find('span' , class_ = 'p13n-sc-price')
            try:
                rating = item.find('span' , class_ = 'a-icon-alt').text.strip()
                total = item.find('a' , class_ = 'a-size-small a-link-normal').text.strip()
                reviews = item.find('div' , class_ = 'a-icon-row a-spacing-none').find('a',class_='a-link-normal').get('href')
            except:
                pass

            image_url = item.find('div' , class_ = 'a-section a-spacing-small').find('img').get('src')

            go_to = item.find('span', class_ = 'a-list-item').find('a' , class_ = 'a-link-normal').get('href')
            product_url = 'https://www.amazon.com' + go_to
    #        product_url = product_url.replace('?','/ref=zg_bs_2399939011_1?')



            reviews_url = 'https://www.amazon.com' + reviews

    #         desc = requests.get(product_url)

    #         print(desc.status_code)
    #         if desc.ok:
    #             desc = desc.text
    #             data = bs(desc,'lxml')

    #             old_price = data.find('span',class_='priceBlockStrikePriceString a-text-strike').text

    #             if(old_price):
    #                 print(old_price)
    #             else:
    #                 pass

            print('************************************************ ' + count +' **********************************************')
            print('Title: {}'.format(title + '\n'))

            if(price):
                price = price.text.strip()
                price = price[1:]
                print('Price: {}'.format(price))
            else:
                pass

            print('Rating: {} ({})'.format(rating , total))
            print('Reviews Url: {}'.format(reviews_url))
            print('Image Url: {}'.format(image_url))
            print('Product Url: {}'.format(product_url))


            print()
            print()

            data = {'Title':[title], 'Price':[price], 'Rating':[str(rating) +'('+ str(total)+')'],
                    'Reviews Url':[reviews_url], 'Image Url': [image_url], 'Product Url':[product_url]}

            df = pd.DataFrame(data)

            dataframe = dataframe.append(df)
    return dataframe



def main():

    page_1 = 'https://www.amazon.com/Best-Sellers-Amazon-Device-Smart-Locks/zgbs/amazon-devices/17295887011/ref=zg_bs_nav_3_5499877011'
    #page_2 = 'https://www.amazon.com/Best-Sellers-Fire-Tablets-Bundles/zgbs/amazon-devices/17142718011/ref=zg_bs_pg_1?_encoding=UTF8&pg=2'

    response_1 = requests.get(page_1)
    #response_2 = requests.get(page_2)


    df1 = scrape_products(response_1)
    #df2 = scrape_products(response_2)

    #df = df1.append(df2)
    df1.Price = df1.Price.astype(float)
    df1 = df1.sort_values(by=['Price'])
    df1.to_csv('AMAZON\Amazon Devices & Accessories\Amazon Device Accessories\Best Sellers in Amazon Device Smart Locks.csv',index=False)


if __name__== "__main__":
  main()
  • 1
    Add the traceback to the question. We shouldn't have to guess which line has the error when python already lets us know. – tdelaney Mar 31 '20 at 17:44
  • This line has getting error: 34 reviews_url = 'https://www.amazon.com' + reviews – Malik Ibrahim Mar 31 '20 at 17:48
  • Does this answer your question? [UnboundLocalError: local variable referenced before assignment when reading from file](https://stackoverflow.com/questions/15367760/unboundlocalerror-local-variable-referenced-before-assignment-when-reading-from) - same problem: variable is only set conditionally – wjandrea Mar 31 '20 at 17:55

1 Answers1

0

You set reviews in a try/except block that simply ignores errors.... and doesn't set reviews. In that case reviews_url = 'https://www.amazon.com' + reviews references reviews before assignment. Worse, rating, total and reviews could be stale data from the previous loop, silently pumping out bad data. You need a policy for handling the error such as skipping that item

        try:
            rating = item.find('span' , class_ = 'a-icon-alt').text.strip()
            total = item.find('a' , class_ = 'a-size-small a-link-normal').text.strip()
            reviews = item.find('div' , class_ = 'a-icon-row a-spacing-none').find('a',class_='a-link-normal').get('href')
        except:
            print("ERROR scanning {}, ignored".format(item))
            import traceback
            traceback.print_exc()
            continue

Catching naked exceptions is a bad idea. It masks bugs as well as expected errors. When you print out the errors you get a feel for which things you can ignore and update the exception handler to

        except (IndexError, ValueError) as e:
            ....
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • Could you please add your code into my code actually I m new to web scraping. – Malik Ibrahim Mar 31 '20 at 17:55
  • Just replace your try/except block with this one. Your code example is kinda long and I think my answer is generally easier to read without all of the extra stuff, so I'll leave it as it is. Its best to reduce code examples to the smallest running code possible. You could remove 2/3 of the code in your example and still demonstrate the problem. But I'm not going to do that! – tdelaney Mar 31 '20 at 18:15