10

I'd like to be able to read in an XML schema (i.e. xsd) and from that know what are valid attributes, child elements, values as I walk through it.

For example, let's say I have an xsd that this xml will validate against:

<root>
  <element-a type="something">
    <element-b>blah</element-b>
    <element-c>blahblah</element-c>
  </element-a>
</root>

I've tinkered with several libraries and I can confidently get <root> as the root element. Beyond that I'm lost.

Given an element I need to know what child elements are required or allowed, attributes, facets, choices, etc. Using the above example I'd want to know that element-a has an attribute type and may have children element-b and element-c...or must have children element-b and element-c...or must have one of each...you get the picture I hope.

I've looked at numerous libraries such as XSOM, Eclipse XSD, Apache XmlSchema and found they're all short on good sample code. My search of the Internet has also been unsuccessful.

Does anyone know of a good example or even a book that demonstrates how to go through an XML schema and find out what would be valid options at a given point in a validated XML document?

clarification

I'm not looking to validate a document, rather I'd like to know the options at a given point to assist in creating or editing a document. If I know "I am here" in a document, I'd like to determing what I can do at that point. "Insert one of element A, B, or C" or "attach attribute 'description'".

Paul
  • 19,704
  • 14
  • 78
  • 96
  • Do you want tools to help explore the xsd or want to process things programmatically? – user802421 Nov 27 '11 at 00:46
  • c) all of the above. I'd like to read in an xsd and be able to present the user (me, for now anyway) valid options for a given point in a validated (with the xsd) xml document. "Choose your own adventure" for xml. – Paul Nov 27 '11 at 02:47
  • See http://stackoverflow.com/questions/1435452/using-a-schema-to-sort-an-xml-document - related, sort of – skaffman Nov 27 '11 at 10:26
  • 1
    I think that what you want is the same with what someone needs when writing a tool to generate sample XML, or data entry UI, from an XML Schema; maybe this kind of parallel could help steer the answers? – Petru Gardea Nov 27 '11 at 16:10

6 Answers6

5

This is a good question. Although, it is old, I did not find an acceptable answer. The thing is that the existing libraries I am aware of (XSOM, Apache XmlSchema) are designed as object models. The implementors did not have the intention to provide any utility methods — you should consider implement them yourself using the provided object model.

Let's see how querying context-specific elements can be done by the means of Apache XmlSchema.

You can use their tutorial as a starting point. In addition, Apache CFX framework provides the XmlSchemaUtils class with lots of handy code examples.

First of all, read the XmlSchemaCollection as illustrated by the library's tutorial:

XmlSchemaCollection xmlSchemaCollection = new XmlSchemaCollection();
xmlSchemaCollection.read(inputSource, new ValidationEventHandler());

Now, XML Schema defines two kinds of data types:

  • Simple types
  • Complex types

Simple types are represented by the XmlSchemaSimpleType class. Handling them is easy. Read the documentation: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaSimpleType.html. But let's see how to handle complex types. Let's start with a simple method:

@Override
public List<QName> getChildElementNames(QName parentElementName) {
    XmlSchemaElement element = xmlSchemaCollection.getElementByQName(parentElementName);
    XmlSchemaType type = element != null ? element.getSchemaType() : null;

    List<QName> result = new LinkedList<>();
    if (type instanceof XmlSchemaComplexType) {
        addElementNames(result, (XmlSchemaComplexType) type);
    }
    return result;
}

XmlSchemaComplexType may stand for both real type and for the extension element. Please see the public static QName getBaseType(XmlSchemaComplexType type) method of the XmlSchemaUtils class.

private void addElementNames(List<QName> result, XmlSchemaComplexType type) {
    XmlSchemaComplexType baseType = getBaseType(type);
    XmlSchemaParticle particle = baseType != null ? baseType.getParticle() : type.getParticle();

    addElementNames(result, particle);
}

When you handle XmlSchemaParticle, consider that it can have multiple implementations. See: https://ws.apache.org/commons/XmlSchema/apidocs/org/apache/ws/commons/schema/XmlSchemaParticle.html

private void addElementNames(List<QName> result, XmlSchemaParticle particle) {
    if (particle instanceof XmlSchemaAny) {

    } else if (particle instanceof XmlSchemaElement) {

    } else if (particle instanceof XmlSchemaGroupBase) {

    } else if (particle instanceof XmlSchemaGroupRef) {

    }
}

The other thing to bear in mind is that elements can be either abstract or concrete. Again, the JavaDocs are the best guidance.

shapiy
  • 1,117
  • 12
  • 14
  • Hello, Thanks a lot for this answer. After looking into this answer I am trying to implement the `XSD` parser using the `XmlSchema Core` but I am not sure how can I get all the child elements and their type and other related information. If possible can you please explain how to get all the child elements for complexity and their type and other associated information? Thanks in advance. – BATMAN_2008 Mar 30 '21 at 15:27
  • I am a bit stuck while parsing the `XSD` using the `APACHE XMLSCHEMA CORE` library. I am trying to resolve this for nearly 1 day. I have posted my question here: https://stackoverflow.com/questions/66874536/how-to-parse-xsd-and-read-all-the-complex-elements-and-its-child-elements-using If you get a chance please check it and provide your solution. Thanks a lot in advance. – BATMAN_2008 Mar 31 '21 at 07:58
4

Many of the solutions for validating XML in java use the JAXB API. There's an extensive tutorial available here. The basic recipe for doing what you're looking for with JAXB is as follows:

  1. Obtain or create the XML schema to validate against.
  2. Generate Java classes to bind the XML to using xjc, the JAXB compiler.
  3. Write java code to:
    1. Open the XML content as an input stream.
    2. Create a JAXBContext and Unmarshaller
    3. Pass the input stream to the Unmarshaller's unmarshal method.

The parts of the tutorial you can read for this are:

  1. Hello, world
  2. Unmarshalling XML
Paul Morie
  • 15,528
  • 9
  • 52
  • 57
  • I'm sorry, I don't think I explained myself very well. I've added a clarification to my original question. I can already validate and marshal/unmarshal XML - I'm looking to create something to assist me in the editing or creation of documents. I tinkered with [Jaxe](http://jaxe.sourceforge.net/en/) a bit but I really need this bit of understanding, not a standalone tool. – Paul Nov 27 '11 at 02:55
  • This answer was closest to what I ended up doing. Rather than using xsj, though, I found [Apache XMLBeans](http://xmlbeans.apache.org/) to be much more powerful, and it's pretty easy to use thanks to the sample code. – Paul Nov 30 '11 at 15:19
  • Glad this could help you. One thing that I'll mention is that I've found XMLBeans to be quite slow compared to JAXB generated classes, but this was years ago (around 2006), so YMMV. – Paul Morie Nov 30 '11 at 15:59
1

This is a fairly complete sample on how to parse an XSD using XSOM:

import java.io.File;
import java.util.Iterator;
import java.util.Vector;

import org.xml.sax.ErrorHandler;

import com.sun.xml.xsom.XSComplexType;
import com.sun.xml.xsom.XSElementDecl;
import com.sun.xml.xsom.XSFacet;
import com.sun.xml.xsom.XSModelGroup;
import com.sun.xml.xsom.XSModelGroupDecl;
import com.sun.xml.xsom.XSParticle;
import com.sun.xml.xsom.XSRestrictionSimpleType;
import com.sun.xml.xsom.XSSchema;
import com.sun.xml.xsom.XSSchemaSet;
import com.sun.xml.xsom.XSSimpleType;
import com.sun.xml.xsom.XSTerm;
import com.sun.xml.xsom.impl.Const;
import com.sun.xml.xsom.parser.XSOMParser;
import com.sun.xml.xsom.util.DomAnnotationParserFactory;

public class XSOMNavigator
{
    public static class SimpleTypeRestriction
    {
        public String[] enumeration = null;
        public String   maxValue    = null;
        public String   minValue    = null;
        public String   length      = null;
        public String   maxLength   = null;
        public String   minLength   = null;
        public String[] pattern     = null;
        public String   totalDigits = null;
        public String   fractionDigits = null;
        public String   whiteSpace = null;

        public String toString()
        {
            String enumValues = "";
            if (enumeration != null)
            {
                for(String val : enumeration)
                {
                    enumValues += val + ", ";
                }
                enumValues = enumValues.substring(0, enumValues.lastIndexOf(','));
            }

            String patternValues = "";
            if (pattern != null)
            {
                for(String val : pattern)
                {
                    patternValues += "(" + val + ")|";
                }
                patternValues = patternValues.substring(0, patternValues.lastIndexOf('|'));
            }
            String retval = "";
            retval += minValue    == null ? "" : "[MinValue  = "   + minValue      + "]\t";
            retval += maxValue    == null ? "" : "[MaxValue  = "   + maxValue      + "]\t";
            retval += minLength   == null ? "" : "[MinLength = "   + minLength     + "]\t";
            retval += maxLength   == null ? "" : "[MaxLength = "   + maxLength     + "]\t";
            retval += pattern     == null ? "" : "[Pattern(s) = "  + patternValues + "]\t";
            retval += totalDigits == null ? "" : "[TotalDigits = " + totalDigits   + "]\t";
            retval += fractionDigits == null ? "" : "[FractionDigits = " + fractionDigits   + "]\t";
            retval += whiteSpace  == null ? "" : "[WhiteSpace = "      + whiteSpace        + "]\t";          
            retval += length      == null ? "" : "[Length = "      + length        + "]\t";          
            retval += enumeration == null ? "" : "[Enumeration Values = "      + enumValues    + "]\t";

            return retval;
        }
    }

    private static void initRestrictions(XSSimpleType xsSimpleType, SimpleTypeRestriction simpleTypeRestriction)
    {
        XSRestrictionSimpleType restriction = xsSimpleType.asRestriction();
        if (restriction != null)
        {
            Vector<String> enumeration = new Vector<String>();
            Vector<String> pattern     = new Vector<String>();

            for (XSFacet facet : restriction.getDeclaredFacets())
            {
                if (facet.getName().equals(XSFacet.FACET_ENUMERATION))
                {
                    enumeration.add(facet.getValue().value);
                }
                if (facet.getName().equals(XSFacet.FACET_MAXINCLUSIVE))
                {
                    simpleTypeRestriction.maxValue = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_MININCLUSIVE))
                {
                    simpleTypeRestriction.minValue = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_MAXEXCLUSIVE))
                {
                    simpleTypeRestriction.maxValue = String.valueOf(Integer.parseInt(facet.getValue().value) - 1);
                }
                if (facet.getName().equals(XSFacet.FACET_MINEXCLUSIVE))
                {
                    simpleTypeRestriction.minValue = String.valueOf(Integer.parseInt(facet.getValue().value) + 1);
                }
                if (facet.getName().equals(XSFacet.FACET_LENGTH))
                {
                    simpleTypeRestriction.length = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_MAXLENGTH))
                {
                    simpleTypeRestriction.maxLength = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_MINLENGTH))
                {
                    simpleTypeRestriction.minLength = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_PATTERN))
                {
                    pattern.add(facet.getValue().value);
                }
                if (facet.getName().equals(XSFacet.FACET_TOTALDIGITS))
                {
                    simpleTypeRestriction.totalDigits = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_FRACTIONDIGITS))
                {
                    simpleTypeRestriction.fractionDigits = facet.getValue().value;
                }
                if (facet.getName().equals(XSFacet.FACET_WHITESPACE))
                {
                    simpleTypeRestriction.whiteSpace = facet.getValue().value;
                }
            }
            if (enumeration.size() > 0)
            {
                simpleTypeRestriction.enumeration = enumeration.toArray(new String[] {});
            }
            if (pattern.size() > 0)
            {
                simpleTypeRestriction.pattern = pattern.toArray(new String[] {});
            }
        }
    }

    private static void printParticle(XSParticle particle, String occurs, String absPath, String indent)
    {
        boolean repeats = particle.isRepeated();
        occurs = "  MinOccurs = " + particle.getMinOccurs() + ", MaxOccurs = " + particle.getMaxOccurs() + ", Repeats = " + Boolean.toString(repeats);
        XSTerm term = particle.getTerm();
        if (term.isModelGroup())
        {
            printGroup(term.asModelGroup(), occurs, absPath, indent);    
        }
        else if(term.isModelGroupDecl())
        {
            printGroupDecl(term.asModelGroupDecl(), occurs, absPath, indent);    
        }
        else if (term.isElementDecl())
        {
            printElement(term.asElementDecl(), occurs, absPath, indent);
        }
    }

    private static void printGroup(XSModelGroup modelGroup, String occurs, String absPath, String indent)
    {
        System.out.println(indent + "[Start of Group " + modelGroup.getCompositor() + occurs + "]" );
        for (XSParticle particle : modelGroup.getChildren())
        {
            printParticle(particle, occurs, absPath, indent + "\t");
        }
        System.out.println(indent + "[End of Group " + modelGroup.getCompositor() + "]");
    }

    private static void printGroupDecl(XSModelGroupDecl modelGroupDecl, String occurs, String absPath, String indent)
    {
        System.out.println(indent + "[GroupDecl " + modelGroupDecl.getName() + occurs + "]");
        printGroup(modelGroupDecl.getModelGroup(), occurs, absPath, indent);
    }

    private static void printComplexType(XSComplexType complexType, String occurs, String absPath, String indent)
    {
        System.out.println();
        XSParticle particle = complexType.getContentType().asParticle();
        if (particle != null)
        {
            printParticle(particle, occurs, absPath, indent);
        }
    }

    private static void printSimpleType(XSSimpleType simpleType, String occurs, String absPath, String indent)
    {
        SimpleTypeRestriction restriction = new SimpleTypeRestriction();
        initRestrictions(simpleType, restriction);
        System.out.println(restriction.toString());
    }

    public static void printElement(XSElementDecl element, String occurs, String absPath, String indent)
    {
        absPath += "/" + element.getName();
        String typeName = element.getType().getBaseType().getName();
        if(element.getType().isSimpleType() && element.getType().asSimpleType().isPrimitive())
        {
            // We have a primitive type - So use that instead
            typeName = element.getType().asSimpleType().getPrimitiveType().getName();
        }

        boolean nillable = element.isNillable();
        System.out.print(indent + "[Element " + absPath + "   " + occurs + "] of type [" + typeName + "]" + (nillable ? " [nillable] " : ""));
        if (element.getType().isComplexType())
        {
            printComplexType(element.getType().asComplexType(), occurs, absPath, indent);
        }
        else
        {
            printSimpleType(element.getType().asSimpleType(), occurs, absPath, indent);
        }
    }

    public static void printNameSpace(XSSchema s, String indent)
    {
        String nameSpace = s.getTargetNamespace();

        // We do not want the default XSD namespaces or a namespace with nothing in it
        if(nameSpace == null || Const.schemaNamespace.equals(nameSpace) || s.getElementDecls().isEmpty())
        {
            return;
        }

        System.out.println("Target namespace: " + nameSpace);
        Iterator<XSElementDecl> jtr = s.iterateElementDecls();
        while (jtr.hasNext())
        {
            XSElementDecl e = (XSElementDecl) jtr.next();

            String occurs  = "";
            String absPath = "";

            XSOMNavigator.printElement(e, occurs, absPath,indent);
            System.out.println();
        }
    }

    public static void xsomNavigate(File xsdFile)
    {
        ErrorHandler    errorHandler    = new ErrorReporter(System.err);
        XSSchemaSet     schemaSet = null;

        XSOMParser parser = new XSOMParser();
        try
        {
            parser.setErrorHandler(errorHandler);
            parser.setAnnotationParser(new DomAnnotationParserFactory());
            parser.parse(xsdFile);
            schemaSet = parser.getResult();
        }
        catch (Exception exp)
        {
            exp.printStackTrace(System.out);
        }

        if(schemaSet != null)
        {
            // iterate each XSSchema object. XSSchema is a per-namespace schema.
            Iterator<XSSchema> itr = schemaSet.iterateSchema();
            while (itr.hasNext())
            {
                XSSchema s = (XSSchema) itr.next();
                String indent  = "";
                printNameSpace(s, indent);
            }
        }
    }

    public static void printFile(String fileName)
    {
        File fileToParse = new File(fileName);
        if (fileToParse != null && fileToParse.canRead())
        {
            xsomNavigate(fileToParse);
        }
    }
}

And for your Error Reporter use:

import java.io.OutputStream;
import java.io.PrintStream;
import java.text.MessageFormat;

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

public class ErrorReporter implements ErrorHandler {

    private final PrintStream out;

    public ErrorReporter( PrintStream o ) { this.out = o; }
    public ErrorReporter( OutputStream o ) { this(new PrintStream(o)); }

    public void warning(SAXParseException e) throws SAXException {
        print("[Warning]",e);
    }

    public void error(SAXParseException e) throws SAXException {
        print("[Error  ]",e);
    }

    public void fatalError(SAXParseException e) throws SAXException {
        print("[Fatal  ]",e);
    }

    private void print( String header, SAXParseException e ) {
        out.println(header+' '+e.getMessage());
        out.println(MessageFormat.format("   line {0} at {1}",
            new Object[]{
                Integer.toString(e.getLineNumber()),
                e.getSystemId()}));
    }
}

For your main use:

public class WDXSOMParser {

    public static void main(String[] args)
    {
        String fileName = null;
        if(args != null && args.length > 0 && args[0] != null)
            fileName = args[0];
        else
        fileName = "C:\\xml\\CollectionComments\\CollectionComment1.07.xsd";
        //fileName = "C:\\xml\\PropertyListingContractSaleInfo\\PropertyListingContractSaleInfo.xsd";
        //fileName = "C:\\xml\\PropertyPreservation\\PropertyPreservation.xsd";

        XSOMNavigator.printFile(fileName);
    }
}
Romain Hippeau
  • 24,113
  • 5
  • 60
  • 79
1

I see you have tried Eclipse XSD. Have you tried Eclipse Modeling Framework (EMF)? You can:

Generating an EMF Model using XML Schema (XSD)

Create a dynamic instance from your metamodel (3.1 With the dynamic instance creation tool)

This is for exploring the xsd. You can create the dynamic instance of the root element then you can right click the element and create child element. There you will see what the possible children element and so on.

As for saving the created EMF model to an xml complied xsd: I have to look it up. I think you can use JAXB for that (How to use EMF to read XML file?).


Some refs:

EMF: Eclipse Modeling Framework, 2nd Edition (written by creators)
Eclipse Modeling Framework (EMF)
Discover the Eclipse Modeling Framework (EMF) and Its Dynamic Capabilities
Creating Dynamic EMF Models From XSDs and Loading its Instances From XML as SDOs

Community
  • 1
  • 1
user802421
  • 7,465
  • 5
  • 40
  • 63
0

Have a look at this. How to parse schema using XOM Parser.

Also, here is the project home for XOM

Droidman
  • 540
  • 4
  • 7
0

It's agood bit of work depending on how compex your xsd is but basically.

if you had

<Document>
<Header/>
<Body/>
<Document>

And you wanted to find out where were the alowable children of header you'd (taking account of namespaces) Xpath would have you look for '/element[name="Document"]/element[name="Header"]'

After that it depends on how much you want to do. You might find it easier to write or find something that loads an xsd into a DOM type structure. Course you are going to possibly find all sorts of things under that elment in xsd, choice, sequence, any, attributes, complexType, SimpleContent, annotation.

Loads of time consuming fun.

Tony Hopkinson
  • 20,172
  • 3
  • 31
  • 39
  • Looks like @Paul Morie is pointing you at an existing implemnentation, if you don't find this sort of thing as much fun as I do. – Tony Hopkinson Nov 27 '11 at 00:46
  • Thanks. I clarified my question a bit. I was hoping to begin with something existing rather than create a parser. I suppose I can tear apart a validator and look at how an xml document is validated against a schema. I didn't think what I'm trying to do would be so unusual. – Paul Nov 27 '11 at 02:58
  • Your requirement isn't unusual, I am. We do a lot of auto-creation of form controls from xsds, or more correctly from entity trees derived from xsds, more recently I've been playing with them and the dynamic type for validatable property bags, so it was at the forefront of my thinking. – Tony Hopkinson Nov 27 '11 at 23:35
  • I think the problem of auto-creation of form controls is an interesting one, and I wish I had more time to explore it right now. Unfortunately I think I'll have to program for the specific case and not the general case. I'll have to add this to my list of "open source projects to start when I get the time". Not having easy-to-use tools to go from xsd to a good UI is either a glaring hole or it's a solution without a problem because each it's better to hand-create something for each xsd rather than auto-create it. Thank you for your insight! – Paul Nov 28 '11 at 01:28