12

I want to ask a question about Java. I have use the URLConnection in Java to retrieve the DataInputStream. and I want to convert the DataInputStream into a String variable in Java. What should I do? Can anyone help me. thank you.

The following is my code:

URL data = new URL("http://google.com");
URLConnection dataConnection = data.openConnection();
DataInputStream dis = new DataInputStream(dataConnection.getInputStream());
String data_string;
// convent the DataInputStream to the String
John Topley
  • 113,588
  • 46
  • 195
  • 237
Questions
  • 20,055
  • 29
  • 72
  • 101
  • 4
    you want to convert DataInputString to String or you want to read String from DataInputString? – jmj Oct 06 '10 at 08:54
  • @org.life.java, thank you for your reply. I want to convent the DataInputStream to string, like (data_string = dis;). by the way, I think it is another question, so I post a new question, no the old question I ask. Thank you. :-) – Questions Oct 06 '10 at 08:59
  • to convert you can just say `String str = dis.toString();` , but It will give you string representation of Object, I don't understand why you need this ? Or you want to read the content of google.com here ? – jmj Oct 06 '10 at 09:01
  • @org.life.java, thank you for your reply. The google is just a example and I want to ask, what do you mean by 'give you string representation of Object'? – Questions Oct 06 '10 at 09:05
  • Object has a method toString that Returns a string representation of the object.I don;t think you are looking for that , What you want to do exactly by converting dis to String , explain with example . – jmj Oct 06 '10 at 09:09
  • @org.life.java, thank you for your reply. my aim is to get the HTML content to be a string. like, String a = " ....". Thank you. – Questions Oct 06 '10 at 09:29
  • @org.life.java, thank you. Also reply your answer. – Questions Oct 06 '10 at 09:40

3 Answers3

11
import java.net.*;
import java.io.*;

class ConnectionTest {
    public static void main(String[] args) {
        try {
            URL google = new URL("http://www.google.com/");
            URLConnection googleConnection = google.openConnection();
            DataInputStream dis = new DataInputStream(googleConnection.getInputStream());
            StringBuffer inputLine = new StringBuffer();
            String tmp; 
            while ((tmp = dis.readLine()) != null) {
                inputLine.append(tmp);
                System.out.println(tmp);
            }
            //use inputLine.toString(); here it would have whole source
            dis.close();
        } catch (MalformedURLException me) {
            System.out.println("MalformedURLException: " + me);
        } catch (IOException ioe) {
            System.out.println("IOException: " + ioe);
        }
    }
}  

This is what you want.

jmj
  • 237,923
  • 42
  • 401
  • 438
  • @org.life.java, thank you for your answer. And i think there is some misunderstand of the problem. After the 'System.out.println(inputLine);', the inputLine become 'null' value and I want the inputLine=" – Questions Oct 06 '10 at 09:39
  • @org.life.java, a great great great help. thank you very much and sorry to lose your time. – Questions Oct 06 '10 at 09:47
  • 1
    I don't believe this can work. `readUTF()` expects string data to be stored in a specific way (see http://download.oracle.com/javase/1.3/docs/api/java/io/DataInput.html#readUTF%28%29). This will not be the case if you try to read content from an arbitrary URL. – Grodriguez Oct 06 '10 at 10:13
  • @Grodriguez Thanks foe letting me know that. I have altered it back to readLine, I know its depricated .other solution are already here like bozho's – jmj Oct 06 '10 at 10:24
  • If you use `DataInputStream.readLine()`, your solution will not work correctly if the content encoding of the URL you are accessing is anything different than plain ASCII. This is why the `readLine` method is deprecated. See my answer to this same question for a way to read the contents of the URL taking into account the content encoding, without resorting to any external libraries. – Grodriguez Oct 06 '10 at 10:42
7

You can use commons-io IOUtils.toString(dataConnection.getInputStream(), encoding) in order to achieve your goal.

DataInputStream is not used for what you want - i.e. you want to read the content of a website as String.

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • This does not take into account the content encoding for the URL you are accessing. You should use the two argument version of the `IOUtils.toString` method in order to explicitly specify the encoding. – Grodriguez Oct 06 '10 at 10:44
  • @Grodriguez or use an `InputStreamReader`. I added the encoding, a good practice indeed. – Bozho Oct 06 '10 at 10:50
  • Even if you pass an `InputStreamReader` instead, you still need to specify the encoding when the `InputStreamReader` is created, otherwise you will have the same problem (the default platform encoding would be used, which may or may not match the encoding of the URL content). – Grodriguez Oct 06 '10 at 10:53
  • @Grodriguez that's what I meant by the `InputStreamReader` suggestion. (Btw the downvote can be removed, I guess) – Bozho Oct 06 '10 at 10:57
7

If you want to read data from a generic URL (such as www.google.com), you probably don't want to use a DataInputStream at all. Instead, create a BufferedReader and read line by line with the readLine() method. Use the URLConnection.getContentType() field to find out the content's charset (you will need this in order to create your reader properly).

Example:

URL data = new URL("http://google.com");
URLConnection dataConnection = data.openConnection();

// Find out charset, default to ISO-8859-1 if unknown
String charset = "ISO-8859-1";
String contentType = dataConnection.getContentType();
if (contentType != null) {
    int pos = contentType.indexOf("charset=");
    if (pos != -1) {
        charset = contentType.substring(pos + "charset=".length());
    }
}

// Create reader and read string data
BufferedReader r = new BufferedReader(
        new InputStreamReader(dataConnection.getInputStream(), charset));
String content = "";
String line;
while ((line = r.readLine()) != null) {
    content += line + "\n";
}
Kuitsi
  • 1,675
  • 2
  • 28
  • 48
Grodriguez
  • 21,501
  • 10
  • 63
  • 107
  • 1
    Does the ContentEncoding header really contain character set? According to [specs](http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11) it should contain eg. gzip. You should be looking at charset. – Kuitsi Feb 26 '13 at 11:25