1

I'm able to access website kissmanga.com yet I can't access it via program. I fixed error 403 that I was getting before that but now I get error 503.

    URL url = new URL("http://kissmanga.com/");
    System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.29 Safari/537.36"); 
    BufferedReader bf = new BufferedReader(new InputStreamReader(url.openStream()));

    String str;
    while((str = bf.readLine()) != null){
        System.out.println(str);
    }


 Error that I get:
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://kissmanga.com/
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at KissManga.main(KissManga.java:10)

Okay this code works with one small annoying problem. I don't get full html but just 2/3 of it.

    HtmlUnitDriver driver = new HtmlUnitDriver();
    driver.get("http://kissmanga.com/");
    Thread.sleep(5000);
    System.out.println(driver.getPageSource());
    driver.quit();
Jake
  • 35
  • 1
  • 2
  • 8
  • 1
    Possible duplicate of [java.io.IOException: Server returned HTTP response code: 403 for URL](http://stackoverflow.com/questions/30092798/java-io-ioexception-server-returned-http-response-code-403-for-url) – Yassin Hajaj Nov 21 '15 at 15:07

3 Answers3

5

You won't get any data this way, because site checks for Javascript enabled.

You should try tools which can emulate browser behaviour. For example, that's how you can get page source with the help of Selenium Htmlunit Driver:

    HtmlUnitDriver drv = new HtmlUnitDriver(BrowserVersion.FIREFOX_38);
    drv.setJavascriptEnabled(true);
    drv.get("http://kissmanga.com/");
    drv.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
    System.out.println(drv.getPageSource());
Yevgen
  • 1,576
  • 1
  • 15
  • 17
  • Does this code work for you? Because I get error that I'm missing a class com/gargoylesoftware.../RefreshHandler – Jake Nov 22 '15 at 09:11
  • You need to add libraries Selenium depends on to your project. The above link is sufficient if you are using Maven. In case you are not, download the driver from the [official site](http://www.seleniumhq.org/download/). You can use [this](http://javatutorial.net/selenium-java-tutorial) tutorial as a reference. – Yevgen Nov 22 '15 at 10:04
  • Thanks for a reply, but I don't know if this will work because as other people commented site uses CloudFlare... Tho I will still try it later on. – Jake Nov 22 '15 at 10:53
  • Hi mate I just tried your suggestion and I must say that it actually works with little problem. In your code in between drv.get and system.out I've added Thread.sleep(5000);, And I get page html. But there is 1/3 of html missing. For example If I load http://kissmanga.com/Manga/Honey-MEGURO-Amu it will get full html yet for just kissmanga.com wont. – Jake Nov 22 '15 at 19:29
  • You can try ChromeDriver for more precise scraping, see [this](http://stackoverflow.com/a/13774924/3262990) example. – Yevgen Nov 22 '15 at 23:00
  • Edited the answer, please try. Also check [this](http://stackoverflow.com/questions/7926246/why-doesnt-htmlunitdriver-execute-javascript) post for reference. – Yevgen Nov 23 '15 at 11:28
  • I thank you very much. I owe you :P. After you edited the answer I understood that It wasn't problem that it didn't download whole html file but more like eclipse console couldn't display so much text without deleting it.... I wish there was some kind of message saying "Some text has been cut". Thanks again! – Jake Nov 23 '15 at 18:53
  • Great, check the question as answered then. – Yevgen Nov 23 '15 at 23:33
0

Error 503 means that server is reachable, but returned an error status code

503 is for "Service Unavailable"

Maybe a problem happened temporarily on server or server rejected your request for some reason

Prim
  • 2,880
  • 2
  • 15
  • 29
  • Is there way around it? – Jake Nov 21 '15 at 14:56
  • I don't know why server sent an error code, but it's probably server side. Maybe some additional http headers are expected by server to serve the request. Maybe you can try to use a library to do the http request, as async-http-client : https://github.com/AsyncHttpClient/async-http-client – Prim Nov 21 '15 at 15:04
0

It's because the site appears to use Cloudflare. You can tell when you visit the site and get 'please wait while we check your browser'

503 = HTTP 503 Service Unavailable

This is Cloudflare telling you to hang on while it makes sure you aren't a DDOS.

You will need to code your parser to review the body and either wait out the redirect, or visit it manually yourself.

Drazisil
  • 3,070
  • 4
  • 33
  • 53