3

I would like to ask what is the problem with my code. I want to get the results from the html page and store the value in a string or later on an array.... Thanks

09-05 16:36:41.221: I/test(22697): plan failed 1org.xml.sax.SAXParseException: attr value delimiter missing! (position:START_TAG @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.221: I/test(22697): plan failed 1a @1:166 in java.io.StringReader@4061bc98) 09-05 16:36:41.231: W/System.err(22697): at org.apache.harmony.xml.parsers.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:151) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:194) 09-05 16:36:41.231: W/System.err(22697): at com.asiatype.boracay.CurrencyActivity$DownloadData.doInBackground(CurrencyActivity.java:1) 09-05 16:36:41.231: W/System.err(22697): at android.os.AsyncTask$2.call(AsyncTask.java:185) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:306) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.FutureTask.run(FutureTask.java:138) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1088) 09-05 16:36:41.231: W/System.err(22697): at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:581) 09-05 16:36:41.231: W/System.err(22697): at java.lang.Thread.run(Thread.java:1027)

            String s,link;
        String theResult = "";
        link="http://www.bsp.gov.ph/statistics/sdds/exchrate.htm";
        Document doc;
        HttpClient client = new DefaultHttpClient();
        HttpGet request = new HttpGet(link);
        HttpResponse response;
        try {
            response = client.execute(request);
            InputStream in = response.getEntity().getContent();
            BufferedReader reader = new BufferedReader(new InputStreamReader(in));
            StringBuilder str = new StringBuilder();
            String line = null;
            while((line = reader.readLine()) != null)
            {
                str.append(line);
            }
            in.close();
            htmlSource = str.toString();
        } catch (ClientProtocolException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        } catch (IOException e2) {
            // TODO Auto-generated catch block
            e2.printStackTrace();
        }


        try {
            doc = DocumentBuilderFactory.newInstance()
                      .newDocumentBuilder().parse(new InputSource(new StringReader(htmlSource)));
            XPathExpression xpath = XPathFactory.newInstance()
                      .newXPath().compile("//div/table/tbody/tr[child::td[contains(text(),\"USD\")]]/td[15]");
                    htmlResult = (String) xpath.evaluate(doc, XPathConstants.STRING);
        } catch (SAXException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 1"+e1);
            Log.i("test", "plan failed 1a "+ htmlSource);
            Log.i("test", "plan failed 1a "+ htmlResult);
            e1.printStackTrace();
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 2");

            e1.printStackTrace();
        } catch (ParserConfigurationException e1) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 3");

            e1.printStackTrace();
        } catch (XPathExpressionException e) {
            // TODO Auto-generated catch block
            Log.i("test", "plan failed 4");

            e.printStackTrace();
        }
andyb
  • 43,435
  • 12
  • 121
  • 150
dicenice
  • 261
  • 1
  • 2
  • 13
  • 1
    Just to clarify, are you trying to extract data from http://www.bsp.gov.ph/statistics/sdds/exchrate.htm ? – andyb Sep 05 '12 at 10:58
  • yes.. i just want to get the value from a table using xpath.... i'm new to it... – dicenice Sep 06 '12 at 06:23

1 Answers1

1

The source HTML file you are using as input is not well formed XML which is why the SAXParseException is being thrown - letting you know that an XML attribute's value delimiter is missing.

HTML and XML are very different. For example, HTML can have missing or non-matching end tags, and unquoted attribute values, whereas XML does not allow that. It is strongly discouraged to not attempt to parse HTML as XML for this reason. The parsing just cannot cater for all the inconsistencies that HTML allows.

There are a few alternative approaches to solve this:

  1. From Reading HTML file to DOM tree using Java - use Neko to attempt to make the HTML valid XML, which would enable you to keep the existing SAXParser code you have to find the date
  2. From same question above - use JTidy to parse the HTML into a DOM tree and find your data using DOM methods instead. See xml dom parser in java? for some Java DOM parsers
Community
  • 1
  • 1
andyb
  • 43,435
  • 12
  • 121
  • 150