String Index Out Of Bounds Exception

Question

I am really stuck on this exception

private static void getUserComment(String s) {
    while(s.contains("author'>")){
        System.out.println(s.substring(s.indexOf("author'>"),
                                       s.indexOf("<div id='")));
        s = s.substring(0, s.indexOf("author'>")) +
                           s.substring(s.indexOf("<div id='"+9));

    } 
}

The printing and the sub string statements both appear to generate this exception — Nayef, Jul 16 '11 at 09:18
The input is an html page which is basically this UR: Lhttp://sabq.org/sabq/user/news.do?section=5&id=20908 I trying to extract useful information such as username and the comment of the commenter, I don't know is it the right way to do it or not? — Nayef, Jul 16 '11 at 09:20
Get a proper HTML parser which loads the page into a DOM, then query the DOM (for instance with XPath if that is supported). The [HTML Parser](http://htmlparser.sourceforge.net/) opensource project may help you. — Lucero, Jul 16 '11 at 09:26

score 3 · Accepted Answer · answered Jul 16 '11 at 09:11

3

You should use a proper parser or at least do some regular expression pattern matching (which is already "bad enough" for HTML or XML).

That said, your "offset" of 9 is likely the indirect cause of the exception:

s.indexOf("<div id='"+9)

This will make a literal string <div id='9 which is not found; indexOf then returns -1 and this causes the exception in the substring method. Maybe you wanted to actually add 9 to the index like this? s.indexOf("<div id='")+9

Note that the function is useless anyways, changing s will only change the local variable and not the original variable (parameters are by value in Java).

answered Jul 16 '11 at 09:11

Lucero

59,176
9
122
152

What I want is to cut the author information and use it and find the next author and cut his information from the String, so after some time I will end up with no author... I don't know anything about pattern matching will it be a good way to extract information? – Nayef Jul 16 '11 at 09:25
1

@Nayef - there are lots of resources on pattern matching. Even books. As @Lucero say, it will work (most of the time). But a proper HTML parser is a better idea. – Stephen C Jul 16 '11 at 09:32
1

As I wrote, use a proper parser. The [HTML Parser](http://htmlparser.sourceforge.net/) opensource project may fit your needs for extraction (but there are others around as well). – Lucero Jul 16 '11 at 09:34
To be honest it is the first time to hear about DOM, what shall I do read about DOM or go with The HTML Parser? – Nayef Jul 16 '11 at 10:25
DOM stands for "Document Object Model" and it is not a specific piece of code but the general name for a parsed hierarchical object representation model for a document. – Lucero Jul 16 '11 at 10:37

String Index Out Of Bounds Exception

1 Answers1