0

I usually use this to get the HTML of a website

public static void main(String[] args) {

    String website = "https://stackoverflow.com/";

    try {
        URL url = new URL(website);
        BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));

        String line;

        while((line = br.readLine()) != null) {

            System.out.println(line);

        }

        br.close();

    } catch (MalformedURLException e) {
            System.out.println("Malformed URL: " + e.getMessage());
    } catch (IOException e) {
            System.out.println("I/O Error: " + e.getMessage());
    }

  }

and it works great. Recently I tried to download a page where I was getting only cryptic smybols like

A.]Ì¢5÷‰)†º¬ˆ®è&ûõdÀ´u5w䓳¡Zn¸4§÷Žtuë¡Pñ_MϦÎ@èÉGfp³~HïøHL×”6µ4SzEƒ¯æÌ¹É+®éÄÉ“ ¶Ð]1×ãôüã°Ñ<Þk6åº|B¯½o;úúà®Êñ¢Q?…Ôó¨ÆrÍ*^)Q@‘uⳫ7¯É—`ázªë  ›K~eôÞŒ•*7tøöK,ë3W'6ÍþVõ•›rb¿Óè¶÷òÂ.+èV&Úw£ødáÂü€jS¬’í’èÑ^4 Ò Š^s:Щý²«»TÈ~BâÝwùŠ?çwv
OÍo¯ûƒr<¹eé5H€aÓL¦ç˜œ?}'4?GŠoÔ›€ž¸Mþ?fÞ³²úˆX¿QàÅ7$ÊíO`ˇ‡°!dGH»ÒÅyõC.«ïì2cÜ$&®4íþp.1™`

but when I save the page using chrome everything seems fine. Is this some kind of protection the site could be doing? Or do I need to change some formats?

I used the link:

https://www.amazon.de/Stackoverflow-T-Shirt-Overflowing-Stack-Overflow/dp/B07KYZJGYR/ref=sr_1_1?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=stackoverflow&qid=1574411393&sr=8-1

Dharman
  • 30,962
  • 25
  • 85
  • 135
Arjihad
  • 117
  • 7
  • 3
    is it possible the data is coming back gzip encoded? – Always Learning Nov 22 '19 at 08:27
  • I dont know. I added the link which gives me this code to my question. – Arjihad Nov 22 '19 at 08:30
  • Could you try specifying the charset : `new InputStreamReader(url.openStream(), StandardCharsets.UTF_8)` ? – Arnaud Nov 22 '19 at 08:31
  • It should have the correct Charset indeed. – Quadrivics Nov 22 '19 at 08:33
  • @Arnaud I tried. It looks different but also no readable HTML code. – Arjihad Nov 22 '19 at 08:33
  • 2
    the link/site you provided *is* gzip-encoded... you should wrap the input stream into a `GZIPInputStream` ..see here: https://stackoverflow.com/q/16351668/592355 – xerx593 Nov 22 '19 at 08:37
  • 1
    Thanks for the hints. I managed to get it working. Also a good link: https://stackoverflow.com/questions/11093153/how-to-read-compressed-html-page-with-content-encoding-gzip – Arjihad Nov 22 '19 at 08:47
  • Possible duplicate of [How to read compressed HTML page with Content-Encoding : gzip](https://stackoverflow.com/questions/11093153/how-to-read-compressed-html-page-with-content-encoding-gzip) – kaya3 Nov 22 '19 at 21:13

0 Answers0