0

I'm currently retrieving HTML from a forums page using this code

String html = null;
URLConnection connection = null;
try {
    connection =  new URL(forumsURL).openConnection();
    connection.addRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36");
    Scanner scanner = new Scanner(connection.getInputStream());
    scanner.useDelimiter("\\Z");
    html = scanner.next();
    scanner.close();
} catch (Exception ex) {
    ex.printStackTrace();
    System.out.println("Issues communicating with forums");
    return;
}

This code had been working for me for years. However, now when it retrieves the HTML, some of the string text values are being replaced with "-".

The HTML I'm parsing looks like this when I view it on the webpage:

<div style class="onlineInfo">
   <span class="PlayersOnline">20</span>
   "/"
   <span class="MaxPlayers">50</span>
</div>

But the HTML my code is returning looks like this:

<div style="display:none" class="onlineInfo">
   <span class="PlayersOnline">-</span>/<span class="MaxPlayers">50</span>
</div>

Notice that the "PlayerOnline" value is being replaced, no longer showing player count, but instead just returning "-".

I'm not sure what has changed that would cause this to break. Any suggestions would be greatly appreciated.

  • 1
    Looks like the internals of that page have changed and now the value is loaded dynamically using JavaScript. Try this answer to check if that's the case: https://stackoverflow.com/a/66519504/9889778 – Krystian G Mar 14 '21 at 18:18
  • 1
    @KrystianG that fixed my issue, thank you so much. – Jakob Glass Mar 14 '21 at 18:45
  • And sometimes, pages will differ on different given user-agents. And some even do JA3 fingerprint analysis to see what app REALLY sends the requests. Bot detection and so on. – JayC667 Mar 14 '21 at 20:36

1 Answers1

0

Maybe it is because of character "-" after "PlayersOnline"

Biavvv
  • 1
  • 1