0

In my code I recover the file, extract the text, manipulate it and write the modified string in the file, I have not had any problems to date, the file I had to manipulate today weighed over 2GB, with over 1 million lines

public static void ModifyFile(string directory, string filename)
        {
            string input = string.Empty;

            using (StreamReader reader = new StreamReader(directory + filename))
            {
                input = reader.ReadToEnd();
            }

            string output = Manipulate(input);

            File.WriteAllText($"{directory}{filename}", String.Empty);
            WriteFile(directory, filename, output);
        }
        
        

        private static void WriteFile(string directory, string filename, string output)
        {
            using (StreamWriter writer = new StreamWriter(directory + filename, true))
            {
                {
                    writer.Write(output);
                }
                writer.Close();
            }
        }
        
        private static string Manipulate(string input)
        {
            var counter = 1;
            StringBuilder output = new StringBuilder();
            string[] subs = input.Split(new string[] { Environment.NewLine }, StringSplitOptions.None);

            foreach (var x in subs)
            {
                if (subs[subs.Length - 1] != x && subs[subs.Length - 2] != x)
                {
                    var column = x.Substring(121, 2);
                    if (column.Equals("NA"))
                    {
                        var c = x.Substring(22, 9);
                        output.Append(ManipulateStringElement(x, counter, 22)
                              .Replace("\r\n", "\n").Replace("\r", "\n").Replace("\n", "\r\n"));
                        output.Append("\n");
                        counter++;
                    }
                }
                else if (subs[subs.Length - 2] == x)
                {
                    output.Append(ManipulateStringElement(x, counter, 22)
                                  .Replace("\r\n", "\n").Replace("\r", "\n").Replace("\n", "\r\n"));
                }
            }

            return output.ToString();
        }

        private static string ManipulateStringElement(string item, int counter, int start)
        {
            return item.Replace(item.Substring(start, 9), GenerateProgressive(counter));
        }

        private static string GenerateProgressive(int counter)
        {
            return $"{counter}".PadLeft(9, '0');
        }

But while running reader.ReadToEnd() I get "OutOfMemoryException" error, which makes me think the file is too big The application is in .NET Framewrok 4.6.1, the operating system is 64bit (I had read that it could affect)

Marduk
  • 125
  • 2
  • 14
  • using `ReadToEnd` you're loading the entire file in memory. Instead, process line by line with `ReadLine`, or by custom chunks with `Read`. – Magnetron Jul 12 '22 at 19:14
  • The same is valid for writing. The way you're current doing, you're not using any stream features, and could be simplified by `File.ReadAllText` and `File.WriteAllText`, which are fine for small files, but as you discovered, have issue for big files.. – Magnetron Jul 12 '22 at 19:25

1 Answers1

3

You need to do this in a streaming fashion in order to reduce memory consumption.

Open an input and an output file at the same time, and immediately output the result of a single line from Manipulate(). Ensure it ends with your custom newline character.

Finally replace the original file with the new one.

public static void ModifyFile(string directory, string filename)
{
    string inputFile = Path.Combine(directory, filename);
    string outputFile = Path.Combine(directory, filename + ".new");

    using (var reader = new StreamReader(inputFile))
    using (var reader = new StreamWriter(outputFile, true))
    {
        string input;
        while((input = reader.ReadLine()) != null)
        {
            string output = Manipulate(input);
            writer.Write(output);
        }
    }

    File.Move(outputFile, inputFile, true);
}

You may also want to do this using async code, which could improve responsiveness.

I note that you are also retrieving the last two lines of the file. I suggest you do this separately, using this answer for example.

There are also other performance improvements you can make. For example:

private static string GenerateProgressive(int counter)
{
    return counter.ToString("D9");
}

as well as:

private static string ManipulateStringElement(string item, int counter, int start)
{
    return GenerateProgressive(counter) + item.Substring(9);
}
Charlieface
  • 52,284
  • 6
  • 19
  • 43