2
    import java.net.*;
    import java.io.*;
    import org.jsoup.Jsoup;
    import org.jsoup.helper.Validate;
    import org.jsoup.nodes.Document;
    import org.jsoup.nodes.Element;
    import org.jsoup.select.Elements;


    public class UrlReaderTest {
        public static void main(String[] args) throws Exception {

        URL url = new URL("https://www.amazon.com/");
        String s = null;
        StringBuilder contentBuilder = new StringBuilder();
        try {
            BufferedReader in = new BufferedReader(new 
            InputStreamReader(url.openStream())); 
            String str;
            while ((str = in.readLine()) != null) {
                contentBuilder.append(str);
            }
            in.close();
        } catch (IOException e) {
            System.err.println("Error");

        }

        s = contentBuilder.toString();
        Document document = Jsoup.parse(s);


        System.out.println(document.text());


        }
    }

What i am getting has mainly symbols like these: Η1?0 Π??0ή=tθ Jr?/β@Q? l?r{ΪεI/ ΉΟ~νJ?j?Ά-??ΙiLs?YdHλ²ύ?α?η?ογV"ηw[:?0??νSQψyθ?*²?γpI? ??²ρνl???2JμΚ?ΣS?Αl4ςRΛ\KR545υ?SK

Is there anything i can do to transform that in a form that i can use? I can't find something specific online.

Edit: What i want specificly is to decrypt that information. What i want for example is to be able to take the text from an event page from facebook search it to find the keywords i want and use those somewhere else.

treyBake
  • 6,440
  • 6
  • 26
  • 57
  • 2
    Are you looking for an answer other than "decrypt the file"? Those symbols are the encrypted file (bits in memory) being read in as text. They look like nonsense because they are the text representation of encrypted data which is basically random 1's and 0's. You cannot get prettier text because that prettier text would not be the text representation of the same data. If you *are* looking for something other than "decrypt the file" please specify what "a form that I can use" means – MyStackRunnethOver Nov 21 '18 at 22:50
  • 1
    The reason you're getting back nonsense is that you're opening a raw stream to an HTTPS URL. Since it's HTTPS, the contents of the stream are encrypted. Consider using `HttpsURLConnection`, which handles the communication for you and just gives you back the decrypted content. Here's an example: https://www.mkyong.com/java/java-https-client-httpsurlconnection-example/ – ethan.roday Nov 21 '18 at 23:37
  • 1
    That looks like a zipped rsponse. Why don't you use `Jsoup` to request the page? I think it decodes the response data by default. – t.m.adam Nov 22 '18 at 01:34
  • 1
    @err1100: No, openStream gives the decrypted data *after* SSL record processing. – President James K. Polk Nov 22 '18 at 01:59
  • 3
    This is all wrong. As @t.m.adam notes, the page is gzipped, so it can't be read using any `Reader`. `Reader` is meant for character streams, not the binary data that you get from gzipping text. – President James K. Polk Nov 22 '18 at 02:35
  • Then being more specific, what i want is to decrypt that data. Is that possible? – Thodoris Ydraios Nov 22 '18 at 14:12
  • 1
    Again, the response data is encoded (compressed), not encrypted. You can decode it with `GZIPInputStream`, but it would be easier to use Jsoup, which decodes zipped data by default: `Document doc = Jsoup.connect("https://www.amazon.com/").get();` – t.m.adam Nov 22 '18 at 17:39
  • 1
    @t.m.adam is correct. The problem is that the response is gzipped, not that it's encrypted. – ethan.roday Nov 23 '18 at 14:58

1 Answers1

4

As @t.m.adam noted in his comment, the problem is that the response from stream is gzipped (compressed). So, if you want to read it from the URL stream, you need to pass it through a GZIPInputStream before InputStreamReader (see this answer). Alternatively, as @t.m.adam suggests, you can use Jsoup's built-in connect() method:

import java.io.IOException;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class UrlReaderTest {
  public static void main(String[] args) {
    System.out.println(System.getProperty("java.classpath"));
    try {
      Document doc = Jsoup.connect("https://www.amazon.com").get();
      System.out.print(doc.text());
    }
    catch (IOException e) {
      System.err.println("Error");
    }

  }
}
ethan.roday
  • 2,485
  • 1
  • 23
  • 27