4

I have C# codes to Remove Non-ASCII Chars in incoming text file and then out put to a .NonAsciiChars text file. because the incoming file is in XML format and the return method could be LF ONLY or CRLF, that's why I am not doing the replacement line by line (I am using StreamReader.ReadToEnd())

Now the problem is when the incoming file is huge (around 2 GB) size, I am getting the below error. is there any better way to do the Remove Non-ASCII Chars in my Case? the incoming file also will send in around 4GB, I afraid on that time, the reading part also will get the OutOfMemoryException.

Thanks a lot.

DateTime:2014-08-04 12:55:26,035 Thread ID:[1] Log Level:ERROR Logger Property:OS_fileParser.Program property:[(null)] - Message:System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.Text.StringBuilder.ExpandByABlock(Int32 minBlockCharCount)
   at System.Text.StringBuilder.Append(Char* value, Int32 valueCount)
   at System.Text.StringBuilder.Append(Char[] value, Int32 startIndex, Int32 charCount)
   at System.IO.StreamReader.ReadToEnd()
   at OS_fileParser.MyProgram.FormatXmlFile(String inFile) in D:\Test\myProgram.cs:line 530
   at OS_fileParser.MyProgram.Run() in D:\Test\myProgram.cs:line 336

myProgram.cs line 530: content = Regex.Replace(content, pattern, "");

myProgram.cs line 336: which is the point call the following method

                const string pattern = @"[^\x20-\x7E]";

                string content;
                using (var reader = new StreamReader(inFile))
                {
                    content = reader.ReadToEnd();
                    reader.Close();
                }

                content = Regex.Replace(content, pattern, "");

                using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
                {
                    writer.Write(content);
                    writer.Close();
                }

                using (var myXmlReader = XmlReader.Create(inFile + ".NonAsciiChars", myXmlReaderSettings))
                {
                    try
                    {
                        while (myXmlReader.Read())
                        {
                        }
                    }
                    catch (XmlException ex)
                    {
                        Logger.Error("Validation error: " + ex);
                    }
                }
user3724711
  • 129
  • 1
  • 8
  • It seems that your code currently works, and you are looking to improve it. Generally these questions are too opinionated for this site, but you might find better luck at [CodeReview.SE](http://codereview.stackexchange.com/tour). Remember to read [their requirements](http://codereview.stackexchange.com/help/on-topic) as they are a bit more strict than this site. – gunr2171 Aug 04 '14 at 15:06
  • 2
    @gunr2171 No, his current code throws an error `when the incoming file is huge (around 2 GB) size`. So, it does not work, and if it does not work it is off-topic for CodeReview and belongs here. – ANeves Aug 05 '14 at 15:13
  • @ANeves, agreed. My main assumption was that the code worked for smaller files, but because of lack of optimization, it would choke on larger files. It's fine here. – gunr2171 Aug 05 '14 at 15:26

1 Answers1

3

You are getting OutOfMemoryException. To conserve memory, you can process file by portions, here is a good example of how to process file line by line and here is by bytes, using buffer (reading by 1 byte is slow).

In simplest case it's like this:

string line;    
using (var reader = new StreamReader(inFile))
    using (var writer = new StreamWriter(inFile + ".NonAsciiChars"))
        while ((line = reader.ReadLine()) != null)
        {
            ... // code to process line
            writer.Write(line);
        }
Community
  • 1
  • 1
Sinatr
  • 20,892
  • 15
  • 90
  • 319
  • thanks for you suggestion, however the incoming file some time is in TWO lines only, when reading the file line, already throw the Outofmemory error, but I will try your suggestion by using bytes :) thank you – user3724711 Aug 04 '14 at 16:04