0

Basically, I'm attempting to pull data from a website by using an HTTP GET request. I create a scanner which looks through all of the information pulled from the GET request. My question specifically is how I can get the Scanner to recognize a range of float values within this desired pattern. The pattern is as follows: "<strong>xk</strong> <div class="match_details_cell_label">Gold</div>"

The letter x above represents a float which could be in the range [0.0-50.0]. My question is how do I represent that to the Scanner. I'm familiar with how to check if an integer is within a set of values, but how do I incorporate that notion of "range" while scanning?

    GetGameInfo http = new GetGameInfo();

    System.out.println("Testing 1 - Send Http GET request");
    Scanner lolscan = new Scanner(http.sendGet());
    String gameGold = 
            lolscan.next("<strong>" + [0-30] + "k</strong><div class=\"match_details_cell_label\">Gold</div>");

As you can see, I tried concatenating a range of acceptable values, but I don't think this is the right way to go about it. Any suggestions?

nhahtdh
  • 55,989
  • 15
  • 126
  • 162

2 Answers2

2

Don't use regex for parsing HTML!! https://stackoverflow.com/a/1732454/1768232

Use JSoup or JSoup Maven instead, like:

List<Double> doubles = new LinkedList<>();
Document doc = Jsoup.connect(url).get();
Elements elem = doc.select("strong");
for(Element element : elem) { 
    try {
        doubles.add(Double.valueOf(elem.text()));
    } catch (NumberFormatException e) {
        // handle it
    }
}
Community
  • 1
  • 1
durron597
  • 31,968
  • 17
  • 99
  • 158
1

Problems you have here:

  1. Regular expressions are bad at parsing HTML. Just one example is that <strong><foo/>30.0</strong>... will fail any sensible regular expression you come up with, but should probably pass your test here. I use regexs on HTML all the time, but you should keep in mind that it's like pointing a gun at your foot and pulling the trigger when you want to show someone that it's not loaded.
  2. Your code is not syntactically valid. Scanner#next takes a String argument.
  3. [0-30] is a character class, matching exactly one character which is one of 0, 1, 2, or 3. Probably not what you mean.

Regular expressions are a bad match for things like "numbers between 0.0 and 50.0". It'd be better to match all numbers, then have Java parse them and compare them numerically.

Nathaniel Waisbrot
  • 23,261
  • 7
  • 71
  • 99