0

I'm trying to get the price from a product on a webpage. Specifically from within the following html. I don't know how to use CSS but these are my attempts so far.

<div class="pd-price grid-100">
  <!-- Selling Price -->
    <div class="met-product-price v-spacing-small" data-met-type="regular">
      <span class="primary-font jumbo strong art-pd-price">
        <sup class="dollar-symbol" itemprop="PriceCurrency" content="USD">$</sup>
         399.00</span>
      <span itemprop="price" content="399.00"></span>
    </div>
</div>
> $399.00

This obviously resides further within a webpage but here is the java code i've attempted to run this.

    String url ="https://www.lowes.com/pd/GE-700-sq-ft-Window-Air-Conditioner-115-Volt-14000-BTU-ENERGY-STAR/1000380463";
    Document document = Jsoup.connect(url).timeout(0).get();
    String price = document.select("div.pd-price").text();
    String title = document.title(); //Get title
    System.out.println("  Title: " + title); //Print title.
    System.out.println(price);
user2769894
  • 23
  • 1
  • 1
  • 5

2 Answers2

0
Element priceDiv = document.select("div.pd-price").first();
String price = priceDiv.select("span").last().attr("content");

If you need currency too:

String priceWithCurrency = priceDiv.select("sup").text();

I'm not run these, but should work. For more detail see JSoup API reference

Frighi
  • 475
  • 4
  • 17
  • This seems to work if I parse only the HTML posted, but not the full Link. Any ideas why this might be happening? I'm getting a null pointer. – user2769894 Jul 31 '18 at 00:39
  • Quick update, I found the reason this isn't working. For some reason home depot won't give you the proper page source if you don't access through a browser. – user2769894 Jul 31 '18 at 00:52
  • How do you get that HTML code? I've visited the page you have in url variable but I can't find that – Frighi Jul 31 '18 at 00:52
  • I inspected the element and used View page source. Both show up. I'm using Firefox which might make a difference? – user2769894 Jul 31 '18 at 00:57
  • I just investigate, when you select for the first time a Shop based on provided Zip code, the site save a cookie about that, and read it for seguent requests. I think you cannot doing scraping in that simple way. – Frighi Jul 31 '18 at 01:02
  • Try to open the page in-private browsering, you will see – Frighi Jul 31 '18 at 01:03
  • Yeah, I see that now thanks though. I'll have to find a way around this. – user2769894 Jul 31 '18 at 01:09
  • It seems a little tricky, considering the amount of cookies are used by the page. Anyway, the way is here: https://stackoverflow.com/questions/6432970/jsoup-posting-and-cookie – Frighi Jul 31 '18 at 01:18
0

First you should familiarize yourself with CSS Selector

W3School has some resource to get you started.

In this case, the thing you need resides inside div with pd-price class so div.pd-price is already correct.

You need to get the element first.

Element outerDiv = document.selectFirst("div.pd-price");

And then get the child div with another selector

Element innerDiv = outerDiv.selectFirst("div.met-product-price");

And then get the span element inside it

Element spanElement = innerDiv.selectFirst("span.art-pd-price");

At this point you could get the <sup> element but in this case, you can just call text() method to get the text

System.out.println(spanElement.text());

This will print

$ 399.0

Edit: After seeing comments in other answer

You can get cookie from your browser and send it from Jsoup to bypass the zipcode requirement

Document document = Jsoup.connect("https://www.lowes.com/pd/GE-700-sq-ft-Window-Air-Conditioner-115-Volt-14000-BTU-ENERGY-STAR/1000380463")
                        .header("Cookie", "<Your Cookie here>")
                        .get();
Zendy
  • 774
  • 7
  • 23