-1

I want to scrape the price out of a website to compare the price between my website The data exported out was in form $XXX.XXX and I want it in a pure number like XXXXXX. Here is my code:

import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;

public class scrape {
    public static void main(String[] args) {
        System.setProperty("webdriver.chrome.driver","...driver\\chromedriver.exe");
        ChromeDriver driver = new ChromeDriver();
        driver.manage().window().maximize();
        driver.get("...");
        WebElement price = driver.findElement(By.xpath("//p[contains(@class, 'box-price-present')]"));
        System.out.println("price: "+price.getText());
        driver.quit();
}}

The exported data was in text form and I want it in pure number form.

not_a_nerd
  • 39
  • 1
  • 7
likenzy
  • 1
  • 2
  • Please give as an actual example of an input and the expected output. – aled May 06 '23 at 15:01
  • input is 4.000.000$ and expected output is 4000000 – likenzy May 06 '23 at 15:12
  • @likenzy put examples in your post, and make sure they make sense: dollar amounts are generally for countries that use a "decimal **point**", with thousands separated by commas, and the currency symbol would be in front, not behind, so is this for a specific locale that uses a different convention? Which one? Also, show examples of what should happen with prices that _do_ have decimal fractions? How many digits? Should it round if it's more than 2? Leave them? [Put all your details in your post](/help/how-to-ask) because parsing prices can be trivial, but almost never is. – Mike 'Pomax' Kamermans May 06 '23 at 16:46

1 Answers1

0

From selenium WebElement.getText method you will get the element text as it is in the DOM. Then you can perform operations on the string as per requirement. In this case, you can replace all non-numeric characters after getting the text from the element.

Here is one example how you can parse the numeric value from the price string.

public static String parseNumericPrice(String price) {
    return price.replaceAll("[^0-9\\.]", ""); // remove everything other than number and dot (.), modify based on your need
}

Further if you need to get the value as double / integer you can pass the output of parseNumericPrice into Double.parseDouble or Integer.parseInt.

Example as per your use-case:

System.out.println(parseNumber(price.getText())); // string
System.out.println(Double.parseDouble(parseNumber(price.getText()))); // as double
Sid
  • 451
  • 3
  • 9
  • Remember to think about the edge cases: if the text is "get 5for $20" then your answer will turn that into the number 520. And of course, remember that SO has been around for a _loooong_ time and this cannot possibly be the first time this got asked, so this is almost certainly a duplicate =) – Mike 'Pomax' Kamermans May 06 '23 at 16:41
  • Yes @Mike'Pomax'Kamermans, just wanted to give a generalised answer, the regex can be modified as per requirement. – Sid May 06 '23 at 16:50