0

Im on quite a basic level of android development.

I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that) So, to be clear: I want to get the text written on a page into etc. a string in my application.

I tried this code, but im not even sure if it does what i want.

URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response     
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";

I cant get any text from it anyhow. How should I do this?

user1112727
  • 63
  • 1
  • 5
  • I don't understand... you need to call rd.readLine() or something like that. – James Kingsbery Feb 28 '12 at 21:13
  • possible duplicate of [How to get the html-source of a page from a html link in android?](http://stackoverflow.com/questions/2423498/how-to-get-the-html-source-of-a-page-from-a-html-link-in-android) – jrummell Feb 28 '12 at 21:26

3 Answers3

1

From the sample code you gave you are not even reading the response from the request. I would get the html with the following code

URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
                        new InputStreamReader(
                            conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null) 
    buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());

From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.

Danny
  • 7,368
  • 8
  • 46
  • 70
1

You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.

FolksLord
  • 992
  • 2
  • 9
  • 17
0

Here is one way:

public String scrape(String urlString) throws Exception {
   URL url = new URL(urlString);
   URLConnection connection = url.openConnection();
   BufferedReader reader = new BufferedReader(new InputStreamReader(
         connection.getInputStream()));
   String line = null, data = "";

   while ((line = reader.readLine()) != null) {
      data += line + "\n";
   }

   return data;
}

Here is another.

Community
  • 1
  • 1
Perception
  • 79,279
  • 19
  • 185
  • 195