-2

I am really stuck on this exception

private static void getUserComment(String s) {
    while(s.contains("author'>")){
        System.out.println(s.substring(s.indexOf("author'>"),
                                       s.indexOf("<div id='")));
        s = s.substring(0, s.indexOf("author'>")) +
                           s.substring(s.indexOf("<div id='"+9));

    } 
}
Nayef
  • 364
  • 2
  • 8
  • 22
  • What does your input string look like? – Mat Jul 16 '11 at 09:07
  • Which line causes the exception? – Arjan Jul 16 '11 at 09:08
  • The printing and the sub string statements both appear to generate this exception – Nayef Jul 16 '11 at 09:18
  • The input is an html page which is basically this UR: Lhttp://sabq.org/sabq/user/news.do?section=5&id=20908 I trying to extract useful information such as username and the comment of the commenter, I don't know is it the right way to do it or not? – Nayef Jul 16 '11 at 09:20
  • 1
    Get a proper HTML parser which loads the page into a DOM, then query the DOM (for instance with XPath if that is supported). The [HTML Parser](http://htmlparser.sourceforge.net/) opensource project may help you. – Lucero Jul 16 '11 at 09:26

1 Answers1

3

You should use a proper parser or at least do some regular expression pattern matching (which is already "bad enough" for HTML or XML).

That said, your "offset" of 9 is likely the indirect cause of the exception:

s.indexOf("<div id='"+9)

This will make a literal string <div id='9 which is not found; indexOf then returns -1 and this causes the exception in the substring method. Maybe you wanted to actually add 9 to the index like this? s.indexOf("<div id='")+9

Note that the function is useless anyways, changing s will only change the local variable and not the original variable (parameters are by value in Java).

Lucero
  • 59,176
  • 9
  • 122
  • 152
  • What I want is to cut the author information and use it and find the next author and cut his information from the String, so after some time I will end up with no author... I don't know anything about pattern matching will it be a good way to extract information? – Nayef Jul 16 '11 at 09:25
  • 1
    @Nayef - there are lots of resources on pattern matching. Even books. As @Lucero say, it will work (most of the time). But a proper HTML parser is a better idea. – Stephen C Jul 16 '11 at 09:32
  • 1
    As I wrote, use a proper parser. The [HTML Parser](http://htmlparser.sourceforge.net/) opensource project may fit your needs for extraction (but there are others around as well). – Lucero Jul 16 '11 at 09:34
  • To be honest it is the first time to hear about DOM, what shall I do read about DOM or go with The HTML Parser? – Nayef Jul 16 '11 at 10:25
  • DOM stands for "Document Object Model" and it is not a specific piece of code but the general name for a parsed hierarchical object representation model for a document. – Lucero Jul 16 '11 at 10:37