13

I am working with dblp XML files. I actually want to parse the dblp.xml file and want to extract the usefull information for my further processing in some project. And that XML File is very huge (1.1 GB) and I am unable to even open this file.

Kindly guide me if you have C# parser for dblp.xml or you can guide me regarding this, or about how can we parse huge xml files.

joce
  • 9,624
  • 19
  • 56
  • 74
Bilal Ahmed Yaseen
  • 2,506
  • 2
  • 23
  • 48
  • 3
    XML parsing has been discussed _as nauseam_ on SO. Here's one such discussion that could enlighten you: http://stackoverflow.com/questions/55828/how-does-one-parse-xml-files – joce Apr 02 '13 at 18:56
  • 1
    See XStreamingElement at http://msdn.microsoft.com/en-us/library/system.xml.linq.xstreamingelement.aspx as well. – Jim Wooley Apr 02 '13 at 18:59
  • Ryan Gates -> OSM files in XML format. This could be reason. – Kostadin Oct 07 '15 at 09:13
  • Had the same issue but realized that the code was compiled in 32 bits. After recompilation in 64 bits it was working fine. Of course it would be better to use the other approaches proposed such as XMLReader, but this simple trick may avoid to modify the code. – Ben Aug 25 '16 at 13:30

2 Answers2

10

Use XML reader instead of XML dom. XML dom stores the whole file in memory which is totally useless:

http://msdn.microsoft.com/en-us/library/system.xml.xmlreader.aspx

Gergo Szekeres
  • 124
  • 1
  • 3
8

You need to use XmlReader

It represents a reader that provides fast, noncached, forward-only access to XML data. Won't load all the data into memory, supposed to be used with large sets of data. Other built in.NET solutions keep the full generated object graph.

XmlReader in action (by Jon Skeet)

Community
  • 1
  • 1
illegal-immigrant
  • 8,089
  • 9
  • 51
  • 84