2

Needed to import a large number of text files and find some research material, particularly for my problem, I decided to post the solution here. I believe it will help someone else.

My files are registries of 3,000,000 up. Tried to read line by line, with StreamReader.ReadLine(), but it was impractical. Moreover, the files are too large to loads them in memory.

The solution was to load files in memory in blocks (buffers) using the streamReader.ReadBlock().

The difficulty I had was that the ReadBlock() reads byte-by-byte, occurring in a row or get another half. Then the next buffer the first line was incomplete. To correct, I load a string (resto) and concatenate with the 1st line (primeiraLinha) of the next buffer.

Another important detail in using the Split, in most examples the 1st verification of variables are followed Trim() to eliminate spaces. In this case I do not use because it joined the 1st and 2nd line buffer.

using System;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main()
        {
            const string arquivo = "Arquivo1.txt";
            using (var streamReader = new StreamReader(arquivo))
            {
                int deslocamento = 1000;
                int pStart = 0; // buffer starting position
                int pEnd = deslocamento; // buffer end position
                string resto = ""; 
                for (int i = pStart; i < int.MaxValue; i += pStart)
                {
                    string primeiraLinha;
                    char[] buffer = new char[pEnd-pStart];
                    streamReader.ReadBlock(buffer, 0, buffer.Length);
                    var bufferString = new String(buffer);
                    string[] bufferSplit = null;
                    bufferSplit = bufferString.Split(new char[] { '\n' });
                    foreach (var bs in bufferSplit )
                    {
                        if (bs != "")
                        {
                            if (resto != "")
                            {
                                primeiraLinha = resto + bs;
                                Console.WriteLine(primeiraLinha);
                                resto = "";
                            }
                            else
                            {
                                if (bs.Contains('\r'))
                                {
                                    Console.WriteLine(bs);
                                }
                                else
                                {
                                    resto = bs;
                                }
                            }
                        }
                    }
                    Console.ReadLine();
                    // Moves pointers
                    pStart = pEnd;
                    pEnd += deslocamento;
                    if (bufferString == null)
                        break;
                }
            }
        }
    }
}

I had a great help from my friend training, Gabriel Gustaf, the resolution of this problem.

If anyone has any suggestions to further improve the performance, or to make any comments, feel free.

1 Answers1

0

C# have a designed class to work with large files: MemoryMappedFile. It's simple and I think could help you.

Conrado Costa
  • 434
  • 2
  • 12