0

I'm attempting to scrape Newegg product pages for prices and I always seem to be running into the same problem - the result is always 'None'.

Here's a few very basic lines of code that work for similar sites such as Amazon:

 data = requests.get('https://www.newegg.com/Product/Product.aspx?Item=N82E16824475015&cm_sp=Homepage_Dailydeal-_-P1_24-475-015-_-03042019')
 soup = BeautifulSoup(data.text, 'html.parser')
 price = soup.find('li', class_='price-current').text.strip()

I'm expecting to get $419.99 as the output, but instead I get None.

When I try to get the product title, I get the desired result. It's only the prices that are giving me this issue. Has anyone had the same issue and how can this be fixed? Thanks in advance.

kamen1111
  • 175
  • 10
  • 1
    The web page you're parsing appears to have some dynamically generated content. Try [selenium](https://stackoverflow.com/questions/17540971/how-to-use-selenium-with-python) – Wondercricket Mar 04 '19 at 21:32
  • I have tried Selenium and some other things which I was unable to get to work, that's why I had to ask for help. I prefer figuring things out on my own but I was just stuck on this one. I appreciate the quick replies and all the help I got. – kamen1111 Mar 04 '19 at 22:56

2 Answers2

2

You can use an attribute selector to target an element containing that price in its content attribute.

import requests
from bs4 import BeautifulSoup

data = requests.get('https://www.newegg.com/Product/Product.aspx?Item=N82E16824475015&cm_sp=Homepage_Dailydeal-_-P1_24-475-015-_-03042019')
soup = BeautifulSoup(data.content, 'lxml')
price = soup.select_one('[itemprop=price]')['content']
print(price)
QHarr
  • 83,427
  • 12
  • 54
  • 101
  • 1
    FWIW, while still display the correct price, this searches for a different tag than the OP is looking for. Mostly due to the fact the web page the OP is scraping is dynamically generated. Still +1 though – Wondercricket Mar 04 '19 at 21:32
  • 1
    @Wondercricket. Thanks. Yup. That is why I went for that. – QHarr Mar 04 '19 at 21:36
  • 1
    Simple and gets the job done; just what I needed for this task. Thank you very much. – kamen1111 Mar 04 '19 at 22:54
1

I like to use the lxml Library as shown below. With it you can use XPATH which is great.

import urllib2
from lxml import etree

url =  "URL HERE"
response = urllib2.urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath('//*[@id="newproductversion"]/span/strong')

I get the expected output 419.99

Julian Silvestri
  • 1,970
  • 1
  • 15
  • 33