0

I'm new to Java, but decided to try my hand at making a little project happen. I'm trying to do some web scraping from a website, my issue is that although I can get the source material, I can't get the "inspect element" material to print out. I've looked over countless videos and searched on here as well but no matter what, I can only make a program print out the source material of this web page. I am trying to get the information out of a table, for pricing. The web page is "https://www.binance.com/trade.html?symbol=ZEC_BTC".

And my basic program is:

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class main {
public static void main(String[] args) throws 
FailingHttpStatusCodeException, MalformedURLException, IOException {
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */

WebClient webClient = new WebClient(BrowserVersion.CHROME);

    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.setJavaScriptTimeout(10000);
    webClient.getOptions().setJavaScriptEnabled(true);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.getOptions().setTimeout(10000);

    String url = "https://www.binance.com/trade.html?symbol=ZEC_BTC";
    System.out.println("Loading page now: "+url);
    HtmlPage page = webClient.getPage(url);
    webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */

    String pageAsXml = page.asXml();
    System.out.println(pageAsXml);

}
}

The idea here was that the program would load up the webpage, and then wait for the javascript to load before printing it. Any help would be VERY appreciated. I just need the javascript elements of the tables containing the prices to come printed out. Thank you.

Brad
  • 11
  • Have you done any tracing of the page's execution in your browser console tools? Do you understand the page's JS execution enough to know how the data is being retrieved and how long to wait? – Jim Garrison Jan 08 '18 at 02:42
  • I don't, however I'm completely open to trying an impractical but functional approach to making this work, like maybe changing the time that it waits to a few minutes. I'm not sure that alone would help though, it's set to 30 seconds right now, and I would assume that should be long enough, right? – Brad Jan 08 '18 at 03:01
  • You really cannot do anything until you understand how the page works, and for that you need to watch the network traffic in the debug console. "Scraping" a page that updates itself dynamically is a challenge. You need to understand when and how the data is fetched, as clearly it's not in the initial request/response cycle. – Jim Garrison Jan 08 '18 at 03:23
  • As it stands the question is much too broad and vague for StackOverflow. I suggest you do more research to understand the page's dynamic behavior, then post a more specific question if there's something you don't understand. – Jim Garrison Jan 08 '18 at 03:24
  • So you're saying that there's too many ways to go about this, and it all depends on how the data is actually displayed. Are you positive there's no catch-all method that could display me a screenshot of the page as the js table is updated? I'm very lost on this subject but really don't want to give up. In the meantime I'm at least going to follow up on your debug console tip. – Brad Jan 08 '18 at 04:35

1 Answers1

0

I just need the javascript elements of the tables containing the prices

What you probably should do is use its API. (What is API and Why use it?)

An API returns exactly the kind of content that you want in a formatted way (usually JSON or XML), easily parseable and readable.

  • I see, I'm looking through the page you sent, thank you. Would you happen to know an example of how API may be implemented to reach a similar result to what I was looking for? – Brad Jan 08 '18 at 05:16