0

Our software spits out a number of xml files and I need to determine which is which. For example, there are three different types of xml file (heavily abbreviated):

"IQ.xml"

<?xml version="1.0" encoding="ISO-8859-1"?>
<Catalog xmlns:dt="urn:schemas-microsoft-com:datatypes">
<Rec>
<ITEM dt:dt="string"></ITEM>
<QTY dt:dt="string"></QTY>
</Rec>
</Catalog>

"IMR.xml"

<?xml version="1.0" encoding="ISO-8859-1"?>
<Catalog xmlns:dt="urn:schemas-microsoft-com:datatypes">
<Rec>
<ITEMS dt:dt="string"></ITEMS>
<MFG dt:dt="string"></MFG>
<ROUTE dt:dt="string"></ROUTE>
</Rec>
</Catalog>

"RP.xml"

<?xml version="1.0" encoding="ISO-8859-1"?>
<Catalog xmlns:dt="urn:schemas-microsoft-com:datatypes">
<Rec>
<REF dt:dt="string"></REF>
<PON dt:dt="string"></PON>
</Rec>
</Catalog>

Anyone of these could be passed out at any time and I need a way to determine where to pass these files to. What is the best way to achieve this? Could a schema be used to test the xml file against the fields and then a result passed back?

My initial thoughts were to test against a schema if it doesn't match the first , move on to the second and so on. This is then hard coded and cannot be changed when different XML file types are later added so I'm not too keen on this. I'm not sure at this stage whether this is even the best approach?

This will be coded in C# so I'm not sure whether there are any inbuilt functions which can help or whether it will have to be custom written.

Has anyone needed to do this before? How did you tackle this?

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
Monika
  • 1
  • 2
  • 2
    What about by file name? – Dan Wilson Nov 21 '16 at 22:49
  • http://stackoverflow.com/questions/55828/how-does-one-parse-xml-files – OldProgrammer Nov 21 '16 at 23:08
  • 3
    There is no standard functionality in .NET to do this. In terms of approaches, you'll need to define what "best" means objectively, as opinion-based questions are [off-topic](http://stackoverflow.com/help/dont-ask). You could always read the list of schemas from another file if you don't want to hardcode. Or just check for the presence/absence of elements. – Heretic Monkey Nov 21 '16 at 23:09
  • The file names are based on the first letter of the leaf nodes, so you can use that. – Slai Nov 21 '16 at 23:22
  • @dan wilson, good call but it turns out the filename is not assigned before I get the xml data which is a pain :( – Monika Nov 22 '16 at 18:13

2 Answers2

0

What I would suggest is to validate the XML file over a schema(like you yourself suggested).

Regarding your problem related to the flexibility of your code to later support other schema's there are many choices but it depends on what you want to do.

For example you can keep all your schema's I an config file, and when you validate a new XML file you can run it programmatically through supported schema's, if there is no match.you can throw an exception(unsupported XML file structure for example).

You might also define statically combinations between certain XML files and certain schema's, which you can later deduce programmatically.

Of course when you want to support new schemas you'll need to change the code... But that's a normal behavior.

To create a fully generic and automated method of handling any kind of XML file and any kind of schema will be difficult and you'll need to probably use some sort of naming convention where you would deduce the associated schema from the name or from some information embedded inside the XML file. This could be done at runtime but even then you'll probably support only a limited number of behaviors and you'll need new code when you want to expand your application.

  • Hi Claudiu, thanks for your detailed reply. If I were to keep all the known schemas in one file and test against this file for a match, would I be able to determine which schema matched so I can call a function specifically for the xml data which was provided? – Monika Nov 22 '16 at 18:12
  • Hi, just keep the filename of the schema în the config, not the actual content... That way when you test you have the filename, you can load the schemas in memory, test and see which one matches! – Claudiu Cojocaru Nov 22 '16 at 22:55
0

Use an XmlReader with an XmlReaderSettings which specifies the type of validation to perform and a ValidationEventHandler. This can be wrapped into a method that will give you the schema or schemas against which the XML document was successfully validated.

If you're concerned about new schemas being added in the future, then just store them in a central location like a directory and grab them at runtime. New schemas could simply be dropped into the directory as needed.

void Main()
{
    var rootDirectory = @"C:\Testing";
    var schemaDirectory = Path.Combine(rootDirectory, "Schemas");
    var dataDirectory = Path.Combine(rootDirectory, "Data");

    var schemaFiles = new[] {
        Path.Combine(schemaDirectory, "IQ.xsd"),
        Path.Combine(schemaDirectory, "IMR.xsd"),
        Path.Combine(schemaDirectory, "RP.xsd")
    };

    var dataFiles = new[] {
        Path.Combine(dataDirectory, "IQ.xml"),
        Path.Combine(dataDirectory, "IMR.xml"),
        Path.Combine(dataDirectory, "RP.xml")
    };

    var results = FindMatchingSchemas(dataFiles[1], schemaFiles).Dump();
    Console.WriteLine("Matching schema is: {0}", results.First(r => r.Value));
}

private static Dictionary<string, bool> FindMatchingSchemas(string dataFile, string[] schemaFiles)
{
    var results = new Dictionary<string, bool>();

    foreach (var schemaFile in schemaFiles)
    {
        results.Add(schemaFile, true);

        // Set the validation settings.
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.ValidationType = ValidationType.Schema;
        settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
        settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
        settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
        settings.ValidationEventHandler += new ValidationEventHandler((object sender, ValidationEventArgs args) =>
        {
            // Validation error
            results[schemaFile] = false;
        });
        settings.Schemas.Add(null, schemaFile);

        // Create the XmlReader object.
        XmlReader reader = XmlReader.Create(dataFile, settings);

        // Parse the file.
        while (reader.Read());
    }

    return results;
}

// Output: Matching schema is: C:\Testing\Schemas\IMR.xsd

There is a free website which can generate XSD documents from XML documents. http://www.freeformatter.com/xsd-generator.html

IQ.xsd

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Rec">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="ITEM"/>
              <xs:element type="xs:short" name="QTY"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

IMR.xsd

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Rec">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:short" name="ITEMS"/>
              <xs:element type="xs:string" name="MFG"/>
              <xs:element type="xs:string" name="ROUTE"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

RP.xsd

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Catalog">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Rec">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="REF"/>
              <xs:element type="xs:short" name="PON"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

Derived from Validating an XML against referenced XSD in C#

Community
  • 1
  • 1
Dan Wilson
  • 3,937
  • 2
  • 17
  • 27