Parsing Large Json to XML (System Out Of Memory Exception)

Question

The JSON File(which is very large) and the script are the following.

Script

Dim JsonContents As String = IO.File.ReadAllText(filePath)

Dim doc As XmlDocument = CType(JsonConvert.DeserializeXmlNode("{""root"":" + JsonContents + "}", "root"), XmlDocument)

Dim Document As New XDocument
Dim query As String

Document = XDocument.Parse(doc.OuterXml)

query = "importJSON"

Dim CMD As New SqlCommand(query)
CMD.Parameters.Add("@type", SqlDbType.Int).Value = importType
CMD.Parameters.Add("@xmlObject", SqlDbType.Xml).Value = Document.ToString
......

JSON File

[{"ID" : "001","NAME" : "Den.Y","GENDER" : "M", AGE : "18"}]

I want to convert it to XML and then pass to the SQL server. But I found the JSON is too large and the then a "System Out Of Memory Exception" is thrown at

 Dim doc As XmlDocument =
 CType(JsonConvert.DeserializeXmlNode("{""root"":" + JsonContents +
 "}", "root"), XmlDocument)

I have no idea about how to modify the script in order to convert large JSON file without out of memory, but small size JSON works fine on this script.

I searched online and some people suggest to use XMLReader, but I have no idea on handling it.

[Solution]

Thank you for everyone's effort. At the end, I cut the large JSON file down to prevent out of memory in the script. Now it works fine with large JSON.

possible duplicate of [Converting JSON to XML](http://stackoverflow.com/questions/16562101/converting-json-to-xml) — ElektroStudios, Apr 13 '15 at 06:32
Why do you have the `CType` there when `JsonConvert.DeserializeXmlNode` returns an XmlDocument? I would hope that removing it would not help, but it is worth trying. Also, do you have the option of running your program compiled for x64 instead of x86? — Andrew Morton, Apr 13 '15 at 19:52
@ElektroStudios: I tried to delete unnecessary statements, and the problem still exists. — Den.Y, Apr 14 '15 at 03:08
@AndrewMorton: I delete CType, the problem still exists. 'System.OutOfMemoryException' was thrown at this statment: Dim JsonXMLDoc As XmlDocument = JsonConvert.DeserializeXmlNode("{root:" + JsonContents + "}", "root") — Den.Y, Apr 14 '15 at 03:15
***** If my JSON file is 55MB, "System.OutOfMemoryException" must be thrown. If it is 33MB, "System.OutOfMemoryException" came by luck. If it is less than 5MB, it works fine all the time — Den.Y, Apr 14 '15 at 03:17
Can you please explain more about : "At the end, I cut the large JSON file down" — Tushar Narang, Jan 23 '19 at 11:42

Joel Coehoorn · Answer 1 · 2015-04-13T22:04:49.653

To begin, it's important here to understand something of how the CLR handles memory. The CLR uses a generational garbage collector, where memory is moved to a higher order generation as it survives each collection. Additionally, there is a special generation, called the "Large Object Heap" (LOH), that is for objects of a certain size and larger. Your JSON string is almost certainly ending up here. The important thing to know is that the LOH is rarely collected; worse, it's almost never compacted. This means that even after an object has been collected and removed from memory, the virtual address space reserved for the object within your process is still in use... and remember that your process has only 2GB of address space by default.

That out of the way, we can see things in your code that may cause a problem. If (and that "if" will be important in a moment) you're doing straight-up string concatenation, this excerpt would end up creating three total objects that copy your string:

"{""root"":" + JsonContents + "}"

There would be one copy for "{""root"":" + JsonContents portion and a second new copy for the trailing + "}". Thankfully, the compiler will typically re-write that into something a little more efficient, and you'll only end up with one additional copy. While I'm here, let's not forget all the CPU work in moving those bytes from the old string to the new string.

One or two additional copies normally isn't too much of a problem, though. I normally see people get into trouble when they're working on a document that uses a doubling algorithm, where it reads through a stream of data and allocates a new buffer that's twice the size of the old one every time the buffer fills up. Most of the collection types in .Net work this way. I'm not familiar with the internals of the JSONConvert type, but it's possible the DeserializeXmlNode() method works this way. If that's case, you'll need to find another way to create your xml document.

We don't have to look that far, though, to find a process creating new string objects over and over. Take a look at the documentation for File.ReadAllText()

This method opens a file, reads each line of the file, and then adds each line as an element of a string.

Uh oh. That sounds like it's concatenating lines to a string over and over. If this were really the cause of your error, I'd expect the File.ReadAllText() call to be where the process breaks. However, it's possible the extra copies created by the string concatenation I showed in the beginning, plus any memory used for the results of the DeserializeXmlNode() call and the CType() cast — remember: if CType() is actually doing any work those will be two separate (large) objects — are the straws that broke the camel's back.

You know how big the file will be, and so we can do a lot better by reading from a StreamReader and writing each line into a StringBuilder object created using the constructor overload the pre-allocates space for buffer. This way, all of the intermediate steps are small enough to avoid the large object heap. But I suspect it won't be enough, and you'll need to find an alternative to the JSONConvert type for creating your XmlDocument object.

Of course, it's possible that your JSON file is just that large. In that case, you may need to split this into multiple calls into the database, and read from the file a section at a time. No single memory object in .Net is allowed to exceed 2GB, even on 64bit systems (the process can get larger, but individual objects cannot).

Appreciate the detail explanation! I break the statements and go through step by step. I found `File.ReadAllText( )` works fine and successfully read all the text, but I found `System.OutOfMemoryException` was thrown by `JsonConvert.DeserializeXmlNode( )`. As my JSON file is only up to 55MB, I have no idea why it reaches the 2GB virtual memory limit causing it to throw `System.OutOfMemoryException`. One more thing, as the error came from `JsonConvert.DeserializeXmlNode( )`, I want to know is there suggestion for converting large JSON to XML in VB.NET? — Den.Y, Apr 14 '15 at 03:30
Just because File.ReadAllText() completes without throwing an error, it doesn't mean it didn't chew through most of your virtual address space in the process. — Joel Coehoorn, Apr 14 '15 at 13:48

Parsing Large Json to XML (System Out Of Memory Exception)

Script

JSON File

[Solution]

1 Answers1