2

I'm developing a C# library with .NET Framework 4.6.2 that parse big xml files.

This library will be part of a Windows Service and I want to don't waste memory loading XML files at once using XDocument.

Maybe there is a better option, but I've decided to use XmlReader instead. In particular, the method ReadToFollowing.

I read that XmlReader represents a reader that provides fast, noncached, forward-only access to XML data.

The xml file I want to read has one section with some data that I have to check before continue reading. Another section with more useful data, and a very big last section with tons of codes.

If the file always has the same section order is Ok, but I'm not sure, and this is my question, if the file will always have the same section order that I has described above.

Will a XML file have the same section order? I have its XSDs files and I don't know if these files describe the order in its sections.

An example of XML file is this (I couldn't share the original one due to NDA):

<?xml version="1.0" encoding="UTF-8"?>
<Incomming_Msg xmlns="http://xxx/xxx.2"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xxx/xxx.2-messages.xsd ">
    <DataToCheck>
        <Field1>
            <SubField1>123456789</SubField1>
        </Field1>
        <Field2>
            <SubField2>123asz11-12asd</SubField2>
            <!-- Omitted for brevety -->
        </Field2>
        <!-- Omitted for brevety -->
    </DataToCheck>
    <DataToInsert1>
        <!-- Omitted for brevety -->
    </DataToInsert1>
    <DataToInsert2>
        <!-- Omitted for brevety -->
    </DataToInsert2>
    <DataToInsert3>
        <!-- Omitted for brevety -->
    </DataToInsert3>
    <TonsOfCodes>
        <CodeLevel>
            <Code>
                <Serial>1234567890</Serial>
            </Code>
        </CodeLevel>
        <!-- Omitted for brevety -->
        <!-- This section could be very very big -->
    </TonsOfCodes>
</Incomming_Msg>

For example, if the xml file comes with TonsOfCodes section at the beginning of the file, reading the file to find the section DataToCheck will be very slow.

VansFannel
  • 45,055
  • 107
  • 359
  • 626
  • 2
    `` in .xsd file meen that elements must be in introduced order. If you are in control how xml file is generated than you can rely on the on the persistent order. But if not - then you need to deal with trade-offs between memory usage and amount of times you need to read a file – Fabio Feb 22 '17 at 07:51
  • @Fabio I have checked XSD file and it has a `` field with the same sequence element order that I expect. Thanks. – VansFannel Feb 22 '17 at 07:57
  • For further reference, this technique (avoiding loading a document entirely in memory thanks to assumptions on the structure of the document) is called streaming. It is widespread and crucial for reading very large semi-structured files. Some querying engines are even able to do so in a way transparent to the user, thanks to declarative languages like XQuery. – Ghislain Fourny Feb 22 '17 at 09:11
  • @GhislainFourny I'm also looking for a better way to do it. If you want to share a better technique to do it, it would be appreciated. – VansFannel Feb 22 '17 at 09:28
  • 2
    @VansFannel in general, I would tend to recommend using declarative languages like XPath, XQuery and XSLT to manipulate XML, because they do not have the impedance mismatch that imperative and/or object-oriented languages have w.r.t. XML. There are a couple of good XQuery engines out there like Saxon, Zorba, existDB, BaseX. They all are compliant with the standard(s), but offer different kinds of additional functionality and libraries. I am more familiar with Zorba, which provides streaming functions to go over large files, but other engines may also have their own ways. I hope this helps! – Ghislain Fourny Feb 22 '17 at 09:50

1 Answers1

3

Will a XML file have the same structure always?

The answer depends on how you define "same structure"

At the XML level, the answer is no: At strictly the XML level, your only assurance is that the XML is well-formed. This means that it meets the standard for being XML: Elements are properly closed; attribute values have proper surrounding quotes; there's only a single root element; etc.

At the schema level, the answer can be yes: Higher level structural commitments require a separate contract such as a schema. Within the bounds of the specificity of the constraints expressed in a particular schema, yes, a valid XML file could be said to have the same structure always. Do note, however, that this strongly depends upon the specific constraints specified by the schema. An xs:sequence in XSD will constrain element ordering, while a xs:all will allow different orders. Further, some properties such as attribute ordering are insignificant at the XML level, so XSD cannot even address the matter.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240