I am trying to scrape specific text from specific table elements on an Amazon product page.
URL_1 has all elements - https://www.amazon.com/dp/B008Q5LXIE/ URL_2 has only 'Sales Rank' - https://www.amazon.com/dp/B001V9X26S
URL_1: The "Product Details" table has 9 items and I am only interested in 'Product Dimensions', 'Shipping Weight', Item Model Number, and all 'Seller's Rank'
I am not able to parse out the text on these items as some are in one block of code, where others are not.
I am using beautifulsoup and I have run a text.strip() on the table and got everything but very messy. I have tried soup.find('li') and text.strip() to find individual elements but with seller rank, it returns all 3 ranks jumbled in one return. I have also tried regex to clean text but it won't work for the 4 different seller ranks. I have had success using the Try, Except, Pass method for scraping and would have each of these in that format
A bad example of the code used, I was trying to get sales rank past the </b>
element in the HTML
#Sales Rank
sales_rank ='NOT'
try:
sr = soup.find('li', attrs={'id':'SalesRank'})
sales_rank = sr.find('/b').text.strip()
except:
pass
I expect to be able to scrape the listed elements into a dictionary. I would like to see the results as
dimensions = 6x4x4
weight = 4.8 ounces
Item_No = IT-DER0-IQDU
R1_NO = 2,036
R1_CAT = Health & Household
R2_NO = 5
R2_CAT = Joint & Muscle Pain Relief Medications
R3_NO = 3
R3_CAT = Naproxen Sodium
R4_NO = 6
R4_CAT = Migraine Relief
my_dict = {'dimensions':'dimensions','weight':'weight','Item_No':'Item_No', 'R1_NO':R1_NO,'R1_CAT':'R1_CAT','R2_NO':R2_NO,'R2_CAT':'R2_CAT','R3_NO':R3_NO,'R3_CAT':'R3_CAT','R4_CAT':'R4_CAT'}
URL_2: The only element of interest on page is 'Sales Rank'. 'Product Dimensions', 'Shipping Weight', Item Model Number are not present. However, I would like a return similar to that of URL_1 but the missing elements would have a value of 'NA'. Same results as URL_1, only 'NA' is given when an element is not present. I have had success accomplishing this by setting a value prior to the Try/Except statement. Ex: Shipping Weight = 'NA' ... then run try/except: pass , so I get 'NA' and my dictionary is not empty.