0

The below method reads the value of a string variable, containing XML formatted data, and allocates it to a dataset. This function has worked fine for a couple years, however the underlying data being retrieved has grown recently. I now intermittently see "out of memory exceptions" thrown on the following line:

"StringReader sr = new StringReader(Regex.Replace(XMLText, pattern, "$1"));"

I've seen other solutions, such as the below two links, where people advise to use File.ReadLines or use a StreamReader. However I'm not sure how to fit that in with the dataSet.ReadXml function.

https://social.msdn.microsoft.com/Forums/en-US/e6ed3216-5cad-463c-b4c0-5a745b0e6b4e/out-of-memory-exception-while-reading-large-text-file?forum=Vsexpressvcs

Read Big TXT File, Out of Memory Exception

    DataSet ds = new DataSet();
    string pattern = @"(</?)(\w+:)";

    //[XMLText] is a string variable containing XML downloaded from a WebServices API.
    StringReader sr = new StringReader(Regex.Replace(XMLText, pattern, "$1"));                                           
    ds.ReadXml(sr);
    return ds;

I can only resolve it by closing and reopening the application. Is there a way to optimize this code to prevent this exception? Many thanks in advance.

  • May be you can think of how your code which processes the dataset. Instead of the entire file being loaded in memory, the code can be changed to process only part of file data. Instead of file, you can think of read the file line by line and store the data in database and processing code deal with the data from database. Also you can think of moving the application to a machine with bigger memory and processing power- this would be the last and least recommended option. – Chetan Jun 21 '19 at 03:26
  • 3
    Out of memory is a game over. There is nothing to optimize, except to not load the entire file at once. – vasily.sib Jun 21 '19 at 03:31
  • @vasily.sib - true, but the OP wants to prevent OOM, not recover from it. – H H Jun 21 '19 at 06:10
  • You're not going to improve anything at this point - the `DataSet` does need all the data to be present to be useful, and it doesn't support streaming the data or anything. You'll need to use something else, but what options you have depends on what you're doing with that data set. Maybe processing the data sequentially with a `XmlReader` will work; maybe you can filter out data you don't need from the input before you load the XML into the data set. But there's no quick "fix out of memory issues" solution :) – Luaan Jun 21 '19 at 06:23
  • @luaan - the error comes _before_ the DS is filled. Replace() needs 2 copies of the string. – H H Jun 21 '19 at 07:14
  • @HenkHolterman That doesn't matter much; it's a solution just like expanding the machine's memory - in the end, you're limited by having to process the whole data set at once, and you'll always need enough memory to store the whole thing. Whether you need two or three times that minimum is a completely different magnitude than streaming the data (if possible). – Luaan Jun 21 '19 at 07:16
  • @Luaan Actually this problem can't be fixed by expanding the memory. The issue is that the Large Object Heap is being fragmented, but it isn't being compacted. As a result, given enough time, the process will always OOM. The solution is to either avoid creating Ephemeral Large Objects or call the GC to compact the LOH, which is silly slow. – Aron Jun 21 '19 at 08:12
  • @Aron Yeah, you're right, this is almost textbook example of large heap fragmentation, with cyclical allocations of one big thing and one slightly smaller thing. But LOH compaction shouldn't be too slow in this case, unless there's work being done in parallel. It all really depends on what the OP is doing with the data set, though replacing the replace with something nicer would definitely help. – Luaan Jun 21 '19 at 10:30

0 Answers0