0

I want to validate if a XML (in a String object) is well formed. Like this:

"<root> Hello StackOverflow! <a> Something here </a> Goodbye StackOverflow </root>"

It should also validate attributes, but I'm kind of too far of that right now. I just want to make sure I have the logic right. Here's what I've got so far, but I'm stucked and I need some help.

public boolean isWellFormed( String str )
{
    boolean retorno = true;

    if ( str == null )
    {
        throw new NullPointerException();
    }

    else
    {
        this.chopTheElements( str );
        this.chopTags();

    }
    return retorno;
}

private void chopTags()
{
    for ( String element : this.elements )
    {
        this.tags.add( element.substring( 1, element.length()-1 ) );
    }
}

public void chopTheElements( String str )
{
    for ( int i = 0; i < str.length(); i++ )
    {
        if ( str.charAt( i ) == '<' )
        {
            elements.add( getNextToken( str.substring( i ) ) );
        }
    }
}

private String getNextToken( String str )
{
    String retStr = "";

    if ( str.indexOf( ">" ) != -1 )
    {
        retStr = str.substring( 0, str.indexOf( ">" ) + 1 );
    }

    return retStr;
}

So far I chopped the elements like "" in a list, and then the tags in another, like this: root, /root.

But I don't know how to proceed or if I'm going in the right direction. I been asigned to solve this without regex.

Any advice? I'm lost here. Thanks.

Cristian
  • 359
  • 3
  • 6
  • 12
  • 1
    Explicitly constructing a NullPointerException instance and throwing it... my eyes! my eyes! – Isaac Oct 04 '12 at 18:46
  • 1
    Very constructive, thank you. Yes, I'm new to Java programming, I'm trying to learn. – Cristian Oct 04 '12 at 18:50
  • What is the problem of throwing an explicit NullPointerException? It's even encouraged to check parameters first, to fail fast. – Johannes Oct 04 '12 at 18:52
  • @Johannes: Java *explicitly* throws a NullPointerException, as and when it is detected. Checking for null and handling it is different from throwing an NPE – Sujay Oct 04 '12 at 18:59
  • @Johannes, that's what `IllegalArgumentException` is for. – Isaac Oct 04 '12 at 19:04
  • yes, but I'd suggest throwing an NPE as early as possible if it is not a "permitted" value. It's much better as if the NPE is thrown in the second method as in this case (if it wouldn't be thrown). Otherwise when debugging the programmer must check from which method the `null` value originates. See for instance Effective Java from Josh Bloch. – Johannes Oct 04 '12 at 19:04
  • No, be as specific as possible. IllegalArgumentException should be thrown in other cases. – Johannes Oct 04 '12 at 19:05
  • If a `null` value is encountered when a non-`null` value is expected, then either `IllegalArgumentException` (if the offender is a method parameter) or `IllegalStateException` (otherwise) are in place. – Isaac Oct 04 '12 at 19:06
  • I still wouldn't throw an IllegalArgumentException or IllegalStateException. Be as specific about the cause as possible. In almost all other cases IllegalArgumentException or IllegalStateException might be appropriate. – Johannes Oct 04 '12 at 19:09
  • 2
    Checkout [Ira Baxter's excellent answer](http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769) on how the hand-code a (recursive descent) parser. – Bart Kiers Oct 04 '12 at 20:01

2 Answers2

1

Starting by breaking the string when you see a "<" is not the way to go about it, because the chunks you identify will be unrelated to the hierarchic structure of the XML. For example, if you have as input:

<a>xxx<b>...</b>yyy</a>

then one of your chunks will be "/b>yyy<" which isn't a useful thing to break up further.

You need to structure your code according to the structure of the grammar. If the grammar says that an element consists of a start tag then a sequence of (elements or characters) then an end tag, then you need a method that matches that sequence, and calls other methods to process its components. Because the grammar is recursive, your code will be recursive, so this is known as recursive descent parsing. It's something that is often taught in computer science courses so you'll find excellent coverage of the topic in textbooks.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • To add do this - I would also recommend the dragon book on parsing - especially the first 200 pages. – Pawel Oct 04 '12 at 21:16
0

If you're not dealing with a huge XML file, consider DOM parsers for your purpose. I would suggest that you look at DocumentBuilder class for this purpose. You would actually need to call the different parse() methods (your source can be a file or any other InputSource)

Sujay
  • 6,753
  • 2
  • 30
  • 49