-1

I am working on to optimize our code so that we can read, create and send xml file which can be very large in size (2GB).

For read and create we are using XmlReader class.

We actually get an XML string from some other service. If we store the XML string in a string variable it takes the same amount of memory. That point aside, please suggest the best way to deal with the XML string so the memory out of bound exception doesn't occur.

I can not show code over here due to company policies but that should not matter because code is already working but in case of large xml string its giving:

memory exception

...as mentioned.

EXPLANATION :

We get 2GB Xml from a service.

We process it using streaming.

Since we need to read that xml using xmlreader, we pass the xml in the form of string to create a new xml with almost same size (2GB)

byte[] msg = Buffer.ExtractMessage(messageStart, messageEnd);  
string msg1 = Encoding.UTF8.GetString(msg);

CreateNewXMLFileFromTheCurrentXmlString(msg1);

We then send that new xml to some other service.

Gaurav123
  • 5,059
  • 6
  • 51
  • 81
  • 7
    Instead of using variable to get xml from service use stream, this msdn article would help, https://msdn.microsoft.com/en-us/library/aa528818.aspx. – Adil Oct 15 '15 at 06:02
  • I guess it all depends if you intend to _load_ and/or _interrogate_ the resulting XML. If so, you're likely to continue to get memory errors using streams or not. _"[Xdocument.Load is not the best approach, reason being Xdocument.Load loads the whole file into memory. According to MSDN memory usage will be proportional to the size of the file. You can use XMLReader (Check here) instead if you are just planning to search the XML doc. Read this documentation on MSDN](http://stackoverflow.com/questions/27078502/does-linq-to-xml-loads-whole-xml-document-during-query)"_ –  Oct 15 '15 at 06:10
  • @Adil The XML library very may well fully consume the stream before parsing, anyway - so that's something OP needs to investigate as well. – Rob Oct 15 '15 at 06:12
  • 1
    If the result from the other service can't be processed in a SAX way, it may need to be written to disk, then processed. – Scott McClenning Oct 15 '15 at 06:20
  • @Adil : yes we are using stream. but once we perform operations on that XML, we need to send the same to some other service so we need to send in the form of sting right ? – Gaurav123 Oct 15 '15 at 06:21
  • You need to give us some idea of what you are doing if we are to help much. If it's just that your XML string is too long, you can parse it into a hierarchy by [`XDocument.Load()`](https://msdn.microsoft.com/en-us/library/system.xml.linq.xdocument.load.aspx). If *that* requires too much memory, check out [Combining the XmlReader and XmlWriter classes for simple streaming transformations](http://blogs.msdn.com/b/mfussell/archive/2005/02/12/371546.aspx) or [How to: Perform Streaming Transform of Large XML Documents](https://msdn.microsoft.com/en-us/library/bb387013.aspx) – dbc Oct 15 '15 at 06:30
  • _"perform operations on that XML"_ - use streams to save to disk not to memory and use XmlReader as discussed here http://stackoverflow.com/questions/27078502/does-linq-to-xml-loads-whole-xml-document-during-query –  Oct 15 '15 at 06:31
  • I have updated the question for providing more explanation. – Gaurav123 Oct 15 '15 at 06:41
  • Not related to the question, but rather code you've shown: What the point of converting byte array to string (2x memory size) when you can just read it directly as stream? – Alexei Levenkov Oct 15 '15 at 06:43
  • @AlexeiLevenkov: Someone has already wrote this code and I need to make some changes for performance. – Gaurav123 Oct 15 '15 at 06:45
  • @Gaurav123, you can send stream to web service as well – Adil Oct 15 '15 at 06:48
  • @Adil : hmm, when I construct my new xml as shown in my question, where to save it ? and how to construct xml from streaming because I need to manipulate what we are getting from service to construct new xml – Gaurav123 Oct 15 '15 at 06:52
  • Guys : can this link be helpful for me ? https://msdn.microsoft.com/en-us/library/dd997372%28v=vs.110%29.aspx – Gaurav123 Oct 15 '15 at 06:55
  • @Gaurav123, the link I shared in first comment will give you an idea. – Adil Oct 15 '15 at 06:58
  • @Gaurav123 _"can this link be helpful for me ?"_ - no. XML is a hierarchical data structure not flat. You should design your system so that it only reads in the minimal amount of data necessary to perform an operation rather than everything. –  Oct 15 '15 at 07:05

1 Answers1

1

The best way would be to use a well normalized and indexed database if that's possible for you. Then getting the data by using LINQ should solve your problems. And the problem is the source and not your logic as XML files shouldn't be as big as yours.

Take a look here: LINQ TO XML

iDraGoN
  • 119
  • 8
  • [This answer](http://stackoverflow.com/questions/27078502/does-linq-to-xml-loads-whole-xml-document-during-query) already states that `XDocument.Load()` _"[is not the best approach, reason being Xdocument.Load loads the **whole file into memory**. According to MSDN memory usage will be proportional to the size of the file](http://stackoverflow.com/questions/27078502/does-linq-to-xml-loads-whole-xml-document-during-query)"_ –  Oct 15 '15 at 06:27
  • I understand, still a normalized and indexed database would make it easier as it wouldn't load the full content. – iDraGoN Oct 15 '15 at 06:31
  • @Micky - but it sounds like OP is loading the whole file into memory *as a string*, which is worse still. I think we need more details. – dbc Oct 15 '15 at 06:31
  • @dbc Agreed. OP has to nurf that variable and we need more info. :) –  Oct 15 '15 at 06:33
  • Well, funny to get a XML file and lose all advantages of it by using String. – iDraGoN Oct 15 '15 at 06:33
  • @iDraGoN Yes I agree with the DB approach but OP may not be able to change _how they get it_. OP: _"We actually get an XML string from some other service"_. As dbc said, we need more info. –  Oct 15 '15 at 06:34