getting only symbols when downloading website with java

Question

I usually use this to get the HTML of a website

public static void main(String[] args) {

    String website = "https://stackoverflow.com/";

    try {
        URL url = new URL(website);
        BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));

        String line;

        while((line = br.readLine()) != null) {

            System.out.println(line);

        }

        br.close();

    } catch (MalformedURLException e) {
            System.out.println("Malformed URL: " + e.getMessage());
    } catch (IOException e) {
            System.out.println("I/O Error: " + e.getMessage());
    }

  }

and it works great. Recently I tried to download a page where I was getting only cryptic smybols like

A.]Ì¢5÷‰)†º¬ˆ®è&ûõdÀ´u5wä“³¡Zn¸4§÷Žtuë¡Pñ_MÏ¦Î@èÉGfp³~HïøHL×”6µ4SzEƒ¯æÌ¹É+®éÄÉ“ ¶Ð]1×ãôüã°Ñ<Þk6åº|B¯½o;úúà®Êñ¢Q?…Ôó¨ÆrÍ*^)Q@‘uâ³«7¯É—`ázªë  ›K~eôÞŒ•*7tøöK,ë3W'6ÍþVõ•›rb¿Óè¶÷òÂ.+èV&Úw£ødáÂü€jS¬’í’èÑ^4 Ò Š^s:î¦¶Ð©ý²«»TÈ~BâÝwùŠ?çwv
OÍo¯ûƒr<¹eé5H€aÓL¦ç˜œ?}'4?GŠoÔ›€ž¸Mþ?fÞ³²úˆX¿QàÅ7$ÊíO`Ë‡‡°!dGH»ÒÅyõC.«ïì2cÜ$&®4íþp.1™`

but when I save the page using chrome everything seems fine. Is this some kind of protection the site could be doing? Or do I need to change some formats?

I used the link:

https://www.amazon.de/Stackoverflow-T-Shirt-Overflowing-Stack-Overflow/dp/B07KYZJGYR/ref=sr_1_1?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=stackoverflow&qid=1574411393&sr=8-1

I dont know. I added the link which gives me this code to my question. — Arjihad, Nov 22 '19 at 08:30
Could you try specifying the charset : `new InputStreamReader(url.openStream(), StandardCharsets.UTF_8)` ? — Arnaud, Nov 22 '19 at 08:31
@Arnaud I tried. It looks different but also no readable HTML code. — Arjihad, Nov 22 '19 at 08:33
the link/site you provided *is* gzip-encoded... you should wrap the input stream into a `GZIPInputStream` ..see here: https://stackoverflow.com/q/16351668/592355 — xerx593, Nov 22 '19 at 08:37
Thanks for the hints. I managed to get it working. Also a good link: https://stackoverflow.com/questions/11093153/how-to-read-compressed-html-page-with-content-encoding-gzip — Arjihad, Nov 22 '19 at 08:47
Possible duplicate of [How to read compressed HTML page with Content-Encoding : gzip](https://stackoverflow.com/questions/11093153/how-to-read-compressed-html-page-with-content-encoding-gzip) — kaya3, Nov 22 '19 at 21:13

getting only symbols when downloading website with java

0 Answers0