-1

I am trying to read a text file that has multiple records. Below is the code that I am using which works fine. But the problem is, if the text file contains millions of records, then it might cause out of memory exception. Can someone help me how I can modify this code to handle millions of records from the text file?

        public List<Test> ExtractDataFromFile(string filePath)
        {
            var rsltData = new List<string>();
            foreach (var line in File.ReadLines(filePath))
            {
                if (!string.IsNullOrWhiteSpace(line))
                {
                    var data = line.Split('|');
                    foreach (var item in data)
                    {
                        rsltData.Add(item);
                    }
                }
            }
         }
Daniel A. White
  • 187,200
  • 47
  • 362
  • 445
user243724
  • 13
  • 1
  • This may help: https://stackoverflow.com/a/9643111/453348 – tttony Feb 26 '23 at 03:02
  • Read parts of lines start 10 until 15: string lines = File.ReadLines(filepath).Skip(9).Take(5).First(); – Ihdina Feb 26 '23 at 03:42
  • 1
    You should explain which OOM you are afraid of because you wrote reading in iterative way, so if there is a problem with memory it is because you store in memory all the data you read. If my guess is correct you should write back your data after splitting (or in general -- after transforming it). Storage could be regular database. – greenoldman Feb 26 '23 at 04:47

1 Answers1

1

You are going to have to read your text file in line-by-line. Something like:

public IEnumerable<string> ExtractDataFromFile(string path) {
    using (StreamReader reader = new StreamReader(path)) {
        while ((line = reader.Readline()) != null) {
            yield return line;
        }
    }
}

Then, per the comments, do something with each line:

foreach(string s in ExtractDataFromFile("C:\\BigFile.csv")) {
    ...
}

This way, you are only consuming memory roughly the size of the StreamReader's buffer, and the current line; as opposed to the entire file.

Just a precaution: If your text file contains quoted newline characters then you may have to use a delimited text file parser.

  • 2
    Not everyone is going to know that a method defined to return an `IEnumerable` and that has a `yield return;` statement is actually an iterator block. You should really explain how to use this and what advantages it has – Flydog57 Feb 26 '23 at 03:19
  • @Flydog57 Well, it was kind of a weird question, followed by broken code. I just need enough points to comment to another answer. –  Feb 26 '23 at 04:21
  • And how supposedly your answer changes anything? See: https://learn.microsoft.com/en-us/dotnet/api/system.io.file.readlines?view=net-7.0 – greenoldman Feb 26 '23 at 04:42
  • I would like to thank everyone's critique especially since my answer did not even iterate through the file and nobody even noticed. Smart guys. LOL –  Feb 26 '23 at 20:22