3

I am developing an Android App which has to extract data from a website and the extracted data will be displayed in a text view in the application

After having tried all the possible ways that i found in the googling and Stackoverflow i am still unable to process the data and now can any one share if they have done ..

Details Website: https://www.amrita.edu/campus/bengaluru

In this website i was looking to extract the data of Latest News block and Upcoming Events

Here's the code : I have used JSOUP to extract

package out.in;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.select.Elements;
import org.w3c.dom.Document;
import android.app.Activity;
import android.os.Bundle;
import android.sax.Element;
import android.widget.TextView;
import android.widget.Toast;

  public class HtmlExtracterActivity extends Activity {
/** Called when the activity is first created. */


//  url
   static final String URL = "https://www.amrita.edu/campus/bengaluru";
@Override
    public void onCreate(Bundle savedInstanceState) {
      super.onCreate(savedInstanceState);
      setContentView(R.layout.main);


    try {
        ((TextView)findViewById(R.id.tv)).setText(getdata());
    } 
    catch (Exception ex) {

        ((TextView)findViewById(R.id.tv)).setText("Error");

    }  

 }



  protected String getdata() throws Exception {
        String result = "";
        // get html document structure
        Document document = (Document) Jsoup.connect(URL).get();


        // selector query
       *********Need help 
        // check results
        *********Need help
        return result;
    }

}

I have given the Internet Permission in the Manifest file and

Xml file is as follows

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
android:orientation="vertical"
android:layout_width="fill_parent"
android:layout_height="fill_parent"
>

<TextView android:text=" "

android:id="@+id/tv" android:layout_width="wrap_content"
 android:layout_height="wrap_content"></TextView>
 </LinearLayout>

I would sincrely Appreciate the needed Help in advance

Santhosh_Reddy
  • 274
  • 4
  • 18

1 Answers1

0

You've not mentioned the exact problem you are facing. Did you try to see what is being returned at this:

Document document = (Document) Jsoup.connect(URL).get();

I am assuming that this might be because of missing User-Agent in the above mentioned code. Please try this and let us know if you still face the error:

Response response= Jsoup.connect(location) .ignoreContentType(true) .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
.referrer("http://www.google.com")
.timeout(12000) .followRedirects(true) .execute();

Document doc = response.parse(); User Agent

Use the latest User agent. Here's the complete list http://www.useragentstring.com/pages/Firefox/.

Timeout

Also don't forget to add timout, since sometimes it takes more than normal timeout to download the page.

Referer

Set the referer as google.

Follow redirects

follow redirects to get to the page.

execute() instead of get()

Use execute() to get the Response object. Which can help you to check for content type and status codes incase of error.

Source: https://stackoverflow.com/a/20284953/1262177

Community
  • 1
  • 1