1

I have code that reads a text file and populate a .Net datatable. The code works fine when it read a smaller size of text file that has 100,000 lines of data. (see snippet below) When I try to read a larger text file size like 200MB and has 3.6 millions line of data throws me an exception of System.OutofMemoryException. Would like to ask an efficient way of reading a large data into a certain chunks.

        using (var stream = File.Open(filePath, FileMode.Open))
        {
            var content = new StreamContent(stream);
            var fileStream = content.ReadAsStreamAsync().Result;

            if (fileStream == null) throw new ArgumentException(Constants.FileEmptyErrorMessage);

            using (var bs = new BufferedStream(fileStream))
            {
                using (var reader = new StreamReader(bs, Encoding.GetEncoding(Constants.IsoEncoding)))
                {


                    while (!reader.EndOfStream)
                    {
                        var line = reader.ReadLine();
                        if (!String.IsNullOrEmpty(line))
                        {
                            string[] rows = line.Trim().Split(new char[] { ';' }, StringSplitOptions.None);

                            DataRow dr = Table.NewRow();
                            dr[Constants.Percepcion] = rows[0];
                            dr[Constants.StartDate] = DateTime.ParseExact(rows[2].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.EndDate] = DateTime.ParseExact(rows[3].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.CID] = rows[4];
                            dr[Constants.Rate] = rows[8];

                            Table.Rows.Add(dr);
                        }
                    }
                }
            }
        }
PooThePanda
  • 13
  • 1
  • 3
  • What do you do with `Table` after adding the rows to it? – Bassem Apr 06 '16 at 19:32
  • You could try amending the buffer size by using a different constructor for your BufferedStream e.g. `new BufferedStream(fileStream, 1024)`. – ManoDestra Apr 06 '16 at 19:38
  • Possible duplicate of [How to read a large (1 GB) txt file in .NET?](http://stackoverflow.com/questions/4273699/how-to-read-a-large-1-gb-txt-file-in-net) – bluetoothfx Apr 06 '16 at 19:39
  • Read file line by line `foreach(var line File.ReadLines(filename))` – Eser Apr 06 '16 at 19:41
  • @BassemAkl: After i populate the datatable, I have to filter it out by its CID – PooThePanda Apr 06 '16 at 19:49
  • @ManoDestra: I have a question if i specify a buffer size in the BufferedSteam. will it read the file continiously? – PooThePanda Apr 06 '16 at 21:24
  • @PooThePanda It will still do exactly what it does now, as it defaults its buffer size value when you instantiate it without explicitly telling it the buffer size to use. It simply means that you can specify a lower buffer size. It may or may not help you to do so, but it's something worth experimenting with here. TBH a FileStream may be all that you require here rather than a BufferedStream (see [here](https://blogs.msdn.microsoft.com/brada/2004/04/15/filestream-and-bufferedstream/)). – ManoDestra Apr 06 '16 at 21:31
  • 1
    @ManoDestra: thank you so much! you saved my day. i am now able to read the 3 millions record and populate the datatable – PooThePanda Apr 06 '16 at 21:33
  • Posted that as answer for you. – ManoDestra Apr 06 '16 at 21:35

3 Answers3

0

I can see that the memory leak is not because the reading the whole file as you already read line by line var line = reader.ReadLine();. I think the leak is because the size of the datatable Table as it contains all the data of the whole file.
I suggest one of these options:
1. If you are performing aggregation functions on the rows of the datatable, just do them (like setting an integer counter, or double max_columnX) without keeping the whole rows.
2. If you really need to keep all the rows. Create a database (MSSQL/MYSQL/ or any) and read the file line by line - as you do - and insert these data to the database. Then query the database with your criteria.
3. You can bulk insert the whole file to a database without the need to process it through your C# application. Here is a SQL SERVER example:

BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
     (
        FIELDTERMINATOR =';',
        ROWTERMINATOR = '\n',
        FIRE_TRIGGERS
      );

Edit: You can attach a memory profiler to find what exactly takes the large memory and add it to the question. It will help getting better answers.

Bassem
  • 820
  • 9
  • 17
0

If you alter the default buffer size of your BufferedStream, then it should load the larger files for you with greater efficiency. E.g.

using (var bs = new BufferedStream(fileStream, 1024))
{
    // Code here.
}

You may be able to get away with simply using a FileStream, specifying a buffer size also, rather than a BufferedStream. See this MSDN blog regarding it for further details.

ManoDestra
  • 6,325
  • 6
  • 26
  • 50
0

Here's what I did to read a big text file. No need to use buffered steam.

var filteredTextFileData = (from textFileData in File.ReadAllLines(_filePathList[0]).Skip(1).Where(line => !string.IsNullOrEmpty(line))
                    let textline = textFileData.Split(';')
                    let startDate = DateTime.ParseExact(textline[2].ToString(), Constants.DayMonthYearFormat, CultureInfo.InvariantCulture)
                    let endDate = !string.IsNullOrEmpty(textline[3]) ? DateTime.ParseExact(textline[3], Constants.DayMonthYearFormat, CultureInfo.InvariantCulture) : (DateTime?)null
                    let taxId = textline[0]
                    join accountList in _accounts.AsEnumerable()
                    on taxId equals accountList.Field<string>(Constants.Comments)
                    where endDate == null || endDate.Value.Year > DateTime.Now.Year || (endDate.Value.Year == DateTime.Now.Year && endDate.Value.Month >= DateTime.Now.Month)
                    select new RecordItem()
                    {
                        Type = Constants.Regular,
                        CustomerTaxId = taxId,
                        BillingAccountNumber = accountList.Field<Int64>(Constants.AccountNo).ToString(),
                        BillingAccountName = accountList.Field<string>(Constants.BillCompany),
                        StartDate = DateTime.Compare(startDate, accountList.Field<DateTime>(Constants.DateActive)) < 1 ? accountList.Field<DateTime>(Constants.DateActive) : startDate,
                        EndDate = endDate,
                        OverrideRate = 0,
                        Result = Constants.NotStarted,
                        TaxCode = _taxCode,
                        ImpliedDecimal = 4
                    }).ToList();
PooThePanda
  • 13
  • 1
  • 3