-1

I am trying to parse the webpage for the steam marketplace using "page.asText()", but this does not work. This might happen because items aren't being loaded after the html is loaded in 1 second.

public static void main(String[] args) throws Exception{
            java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
            java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);
            String link="http://steamcommunity.com/market/search?appid=730#p6_price_asc";
            HtmlPage page;
            WebClient webClient = new WebClient(BrowserVersion.CHROME);
            page = (HtmlPage) webClient.getPage(link);
            System.out.println(page.asText());
            }

In the console I see:

Show advanced options...






 < 1 2 3 4 5 6 ... 939 >
 Showing 1-10 of 9389 results

It needs to be:

Show advanced options...
PRICE
QUANTITY
NAME
31,218
 Starting at:
 $0.35 USD
Operation Hydra Case 
 Counter-Strike: Global Offensive
 276,582
 Starting at:
 $0.23 USD
.
.
.

M4A1-S | Decimator (Field-Tested) 
 Counter-Strike: Global Offensive


 232
 Starting at:
 $27.06 USD

AWP | Asiimov (Battle-Scarred) 
 Counter-Strike: Global Offensive


 28,068
 Starting at:
 $0.75 USD

Krakow 2017 Legends Autograph Capsule 
 Counter-Strike: Global Offensive


 < 1 2 3 4 5 6 ... 940 >
 Showing 1-10 of 9392 results

1 Answers1

0

First of all, make sure javascript is enabled.

webClient.getOptions.setJavaScriptEnabled(true);

What I typically do in order to wait for more elements to load is:

thread.sleep(3000);

This gives the page 3 seconds to load all additional content.

You can also try any of the other methods listed by other users here:

HTMLUnit doesn't wait for Javascript

  • 1
    when need to use "thread.sleep(3000);"? WebClient webClient = new WebClient(BrowserVersion.CHROME); webClient.getOptions().setJavaScriptEnabled(true); page = (HtmlPage) webClient.getPage(link); System.out.println(page.asText()); – Dany worner Aug 16 '17 at 18:13
  • You will need to use thread.sleep() after webClient.getPage(link). – Demirhan Ozel Aug 16 '17 at 18:19
  • WOW."getPage(link)" always reload? i think getPage one time and all. Thank u. so much) – Dany worner Aug 16 '17 at 18:23
  • No problem. If my answer was helpful to you, please consider up voting my answer and accepting it. Thanks. – Demirhan Ozel Aug 16 '17 at 18:24