Extract Data of 2GB XML File in c#

Question

I have a 2 GB XML file containing around 2.5 million records. I am not being able to load it in c#. It is throwing out of memory exception. Please help me to resolve it with easy method.

Is your app compiled as 64-bit? Are you using https://learn.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element ? — mjwills, May 24 '18 at 06:39
Without a [mcve] documenting your specific problems, we can't do much more than to point you to [How to parse very huge XML Files in C#?](https://stackoverflow.com/q/15772031), [What is the best way to parse (big) XML in C# Code?](https://stackoverflow.com/q/676274), [Large XML Parsing Efficiently](https://stackoverflow.com/q/29951809) and [How to read large xml file without loading it in memory and using XElement](https://stackoverflow.com/q/2249875). Also, be sure you're loading directly from a `Stream` and not reading into a `string` and parsing that. — dbc, May 24 '18 at 06:44
Set your project to 64 bit (if you can), job done, or parse it — TheGeneral, May 24 '18 at 06:49
Sum together the responses of Prateek and mjwills. Compile at 64 bits AND use `XmlReader`. Don't load completely the file in memory. Don't use `XDocument`/`XmlDocument`/`XmlSerializer`. Write the result of your reading one piece at a time. — xanatos, May 24 '18 at 06:51
Show us the structure of your xml-file and what data you want to extract. I'll give you an example XmlReader. — Alexander Petrov, May 25 '18 at 17:15

score 0 · Answer 1 · answered May 24 '18 at 08:12

Simple and general methodology when you have these problems:

As written by mjwills and TheGeneral, compile at 64 bits
As written by Prateek use XmlReader. Don't load completely the file in memory. Don't use XDocument/XmlDocument/XmlSerializer.
If the size of the output is proportional to the size of the input (you are making a conversion of formats for example), write the result of your reading one piece at a time. If possible you shouldn't have the whole output in memory at the same time. You read an object (a node) from the source file, you make your elaborations, you write the result in a new file/on a db, you discard the result of the elaboration
If the output instead is a summary of the input (for example you are calculating some statistics on the input), and so the size of the output is sub-proportional to the size of the input then normally it is ok to keep it in memory

Extract Data of 2GB XML File in c#

1 Answers1