0

I am working on a project where i have an online solution, where you can upload an xml file and parses the data within the xml to the database. The files can now be up to 1GB big and my solution must be able to handle this which gives me some difficulty for obvious reasons. My idea was to send the data to the database in chunks.

Here is a small example of how the xml file can look

    <BPM6 language="english" submissionType="normal" version="14">
  <masterdata tlf="" by="" postnr="" gadenavn="" firmanavn="" loebenr="" 
refperio="201711" idno="61092919">
    <kontaktpersoner>
      <person fuldtnavn="Morten Nystrup Rasmussen" tlf="" 
email="mnr@nationalbanken.dk" />
    </kontaktpersoner>
  </masterdata>
  <AK1>
    <AK1MedIsin>
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J"/>
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J"VaerdiPrincip="M"Fritekst="&#xD;" />
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J" VaerdiPrincip="M" Fritekst="&#xD;" />
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J" VaerdiPrincip="M" Fritekst="&#xD;" />
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J" VaerdiPrincip="M" Fritekst="&#xD;" />
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J" />
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J"VaerdiPrincip="M"/>
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J"VaerdiPrincip="M"/>
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J"VaerdiPrincip="M"/>
      <data Isin="123" Koncern="D" Valuta="ANG" VPreg="J" VaerdiPrincip="M" />
    </AK1MedIsin>
  </AK1>
  <OB2a>
    <OB2aUdenIsin>
        <data InternKode="3" Branche="BZ2" Sektor="1120"/>
        <data InternKode="4" Branche="BZ2" Sektor="1120"/>
        <data InternKode="5" Branche="BZ2" Sektor="1120"/>
        <data InternKode="6" Branche="BZ2" Sektor="1120"/>
        <data InternKode="7" Branche="BZ2" Sektor="1120"/>
    </OB2aUdenIsin>
   </OB2a>
</BPM6>

The full xml files are obviously a lot bigger. My main problem is that i have to read the data within the nodes "AK1MedIsin" and "OB2aUdenIsin" and these names are unknown and will change from file to file.

            using (var reader = XmlTextReader.Create(File.OpenRead(@"C:\data.xml")))
            {

                var state = State.PreMasterData;

                if (reader.MoveToContent() == XmlNodeType.Element)
                {
                    while (reader.Read())
                    {
                        switch (state)
                        {
                            case State.PreMasterData:
                                if (reader.Name == "masterdata")
                                {
                                var masterDatastring = ReadMasterData(reader);
                                state = State.PostMasterData;
                                    break;
                                }
                                break;

                            case State.PostMasterData:
                                //Read chunks of the data of the child notes of BPM6
                                break;
                        }
                    }
                }
            }

Reading the MasterData is not a problem because that name is always the same. So i can just use xpath or reader.name to go to the note and read the lines one by one. But how to do this when i dont know the name of the node has been difficult for me. I tried to use the method reader.ReadSubtree(), but that seems not possible.

  • 3
    `which gives me some difficulty for obvious reasons` Can you explain that a bit more? It may be obvious to you, but not to us. – mjwills Oct 26 '17 at 12:22
  • 1
    You may use a Xml SAX Parser, which is event based and doesn't need to load the entire tree, like in the DOM Parser. Here a couple of links that could be of help: https://stackoverflow.com/questions/6828703/what-is-the-difference-between-sax-and-dom, https://stackoverflow.com/questions/676274/what-is-the-best-way-to-parse-big-xml-in-c-sharp-code – Simone Cifani Oct 26 '17 at 12:29
  • Yea sorry about that. The current solution simply can not handle xml files bigger than 150 mb, which is why i need to find a new solution. And in order to change as little code as possible in the back-end my idea was to send chunks of the xml to the back-end in sizes it can handle. – Soren123 Oct 26 '17 at 12:40
  • See my solution at posting below. I used a mixture of XmlReader and xml linq. For real complicated results I have solution to get multiple element and can help. https://stackoverflow.com/questions/45822054/using-xmlreader-and-xpath-in-large-xml-file-c-sharp – jdweng Oct 26 '17 at 12:41
  • Ty for your answer jdwent 55. The solution is something i have been looking at. But it does assume that you know the name of the nodes you want to read. In my case i do not know the names, only their position in the xml. – Soren123 Oct 26 '17 at 13:39
  • You could use a recursive approach. See treenode solution : https://stackoverflow.com/questions/28976601/recursion-parsing-xml-file-with-attributes-into-treeview-c-sharp – jdweng Oct 27 '17 at 06:28

0 Answers0