0

I want to read records from XML files and save the data to another file or database. The data I am interested in is actually very small but I often run into an "out of memory" exception. The problem I see is the XML files because some are very large (several gigabytes) and they contain countless records. I don't need all these records at the same time. Whenever I have read a certain number, I save them and discard my read data.

The code to load this data looks like this for me:

        using (FileStream fs = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.Read))
        using (BufferedStream bs = new BufferedStream(fs))
        using (StreamReader reader = new StreamReader(bs))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                using (XmlReader xr = XmlReader.Create(reader))
                {
                    while (xr.Read())
                    {
                        switch (xr.NodeType)
                        {
                            case XmlNodeType.Element:
                                if (xr.LocalName == "Record")
                                {
                                    string xml = xr.ReadOuterXml();
                                    // Parse XML, Put data into list, save list and clear list  
                                                                                                                                                  
                                                                                                                 

It works fine on smaller files but maybe it is not the best practice for reading very big files.

All suggestions will be gratefully received.

TalkingCode
  • 13,407
  • 27
  • 102
  • 147
  • Why are you using `BufferedStream` it seems unnecessary? Also why create the `XmlReader` *inside* the loop, it could be done outside? Also what are you doing with the results, we can't see the rest of your code apart from `// Parse XML, Put data into list, save list and clear list` – Charlieface Jul 20 '22 at 10:00
  • What is causing the out of memory? Is a single `Record` so huge that it doesn't fit into a string where you use `string xml = xr.ReadOuterXml();`? Or what happens in `// Parse XML, Put data into list, save list and clear list`? In general, the need to do `string xml = xr.ReadOuterXml();` surprises me, I would expect to see the use of e.g. https://learn.microsoft.com/en-us/dotnet/api/system.xml.linq.xnode.readfrom?view=net-6.0 – Martin Honnen Jul 20 '22 at 10:33
  • See my solution at following which uses a combination of XML Reader and XML Linq : https://stackoverflow.com/questions/61607180/parse-big-xml-file-using-xmlreader?force_isolation=true – jdweng Jul 20 '22 at 11:14
  • Try replacing `ReadOuterXml()` with `ReadSubtree()` as shown in [ReadOuterXml is throwing OutOfMemoryException reading part of large (1 GB) XML file](https://stackoverflow.com/a/46628379/3744182). – dbc Jul 20 '22 at 16:22
  • Also, what is the purpose of `while ((line = reader.ReadLine()) != null)`? Does the XML file contain a line at the beginning that needs to be replaced? – dbc Jul 20 '22 at 16:50
  • But beyond that I think we need to see a [mcve] to help you. – dbc Jul 20 '22 at 22:01

0 Answers0