4

In my app I need to download some web page. I do it in a way like this

URL url = new URL(myUrl);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setReadTimeout(5000000);//5 seconds to download
conn.setConnectTimeout(5000000);//5 seconds to connect
conn.setRequestMethod("GET");
conn.setDoInput(true);

conn.connect();
int response = conn.getResponseCode();
is = conn.getInputStream();

String s = readIt(is, len);
System.out.println("got: " + s);

My readIt function is:

public String readIt(InputStream stream) throws IOException {
    int len = 10000;
    Reader reader;
    reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[len];
    reader.read(buffer);
    return new String(buffer);
}

The problem is that It doesn't dowload the whole page. For example, if myUrl is "https://wikipedia.org", then the output is enter image description here

How can I download the whole page?

Update Second answer from here Read/convert an InputStream to a String solved my problem. The problem is in readIt function. You should read response from InputStream like this:

static String convertStreamToString(java.io.InputStream is) {
   java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A");
   return s.hasNext() ? s.next() : "";
}
PepeHands
  • 1,368
  • 5
  • 20
  • 36
  • Maybe make the read time longer? – lonesome Dec 16 '15 at 12:26
  • @lonesome looks like I accedently found a solution here: https://stackoverflow.com/questions/309424/read-convert-an-inputstream-to-a-string – PepeHands Dec 16 '15 at 12:30
  • @lonesome readInt from `developer.android.com` works strange. If I read using this trick `java.util.Scanner s = new java.util.Scanner(is).useDelimiter("\\A"); return s.hasNext() ? s.next() : "";` then everything is fine – PepeHands Dec 16 '15 at 12:31

3 Answers3

4

There are a number of mistakes your code:

  1. You are reading into a character buffer with a fixed size.

  2. You are ignoring the result of the read(char[]) method. It returns the number of characters actually read ... and you need to use that.

  3. You are assuming that read(char[]) will read all of the data. In fact, it is only guaranteed to return at least one character ... or zero to indicate that you have reached the end of stream. When you reach from a network connection, you are liable to only get the data that has already been sent by the other end and buffered locally.

  4. When you create the String from the char[] you are assuming that every position in the character array contains a character from your stream.

There are multiple ways to do it correctly, and this is one way:

public String readIt(InputStream stream) throws IOException {
    Reader reader = new InputStreamReader(stream, "UTF-8");
    char[] buffer = new char[4096];
    StringBuilder builder = new StringBuilder();
    int len;
    while ((len = reader.read(buffer) > 0) {
        builder.append(buffer, 0, len);
    }
    return builder.toString();
}

Another way to do it is to look for an existing 3rd-party library method with a readFully(Reader) method.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • Thanks for explanations. This `readIt` function I just copy-pasted from here: https://developer.android.com/intl/ru/training/basics/network-ops/connecting.html Your solution gives me an `ArrayIndexOutOfBounds` exception even if I set buffer size to 10000 (the page I downloading is only 5KB). I think the easiest way to solve it is in my **Update** section – PepeHands Dec 16 '15 at 13:00
0

You need to read in a loop till there are no more bytes left in the InputStream.

    while (-1 != (len = in.read(buffer))) { //do stuff here}
Argha Sen
  • 81
  • 7
0

You are reading only 10000 bytes from the input stream.

Use a BufferedReader to make your life easier.

public String readIt(InputStream stream) throws IOException {
     BufferedReader reader = new BufferedReader(new InputStreamReader(stream));
     StringBuilder out = new StringBuilder();
     String newLine = System.getProperty("line.separator");
     String line;
     while ((line = reader.readLine()) != null) {
        out.append(line);
        out.append(newLine);
     }
    return out.toString();
}
George Lee
  • 826
  • 6
  • 11
  • but the response is shorter than 10000 bytes – PepeHands Dec 16 '15 at 12:32
  • But may be you are right. reading like in the second aswer here solved the problem https://stackoverflow.com/questions/309424/read-convert-an-inputstream-to-a-string – PepeHands Dec 16 '15 at 12:33
  • also I had the same result then `len` was 1000 – PepeHands Dec 16 '15 at 12:34
  • A quick inspection via Chrome dev tools says that the wikipedia home page is 60.8KB in size so, 60800 bytes which is > 10000. – George Lee Dec 16 '15 at 12:37
  • but if I set `len` to 1000 the response will be absolutely the same. And the page I actually need to download is only 5KB and I cant download it even if I set `len` to 100000. It always stops on the same place. – PepeHands Dec 16 '15 at 12:39
  • There is no guarantee that the set buffer len will be used. It is only the maximum len. You can check the returned number from the read() method to get the number of read bytes. – tak3shi Dec 16 '15 at 12:55