5

If has a csv file whose data will increase by time to time. Now what i need to do is to read the last 30,000 lines.

Code :

string[] lines = File.ReadAllLines(Filename).Where(r => r.ToString() != "").ToArray();

 int count = lines.Count();

 int loopCount = count > 30000 ? count - 30000 : 0;

  for (int i = loopCount; i < lines.Count(); i++)
  {
      string[] columns = lines[i].Split(',');
      orderList.Add(columns[2]);
  }

It is working fine but the problem is

File.ReadAllLines(Filename)

Read a complete file which causes performance lack. I want something like it only reads the last 30,000 lines which iteration through the complete file.

PS : i am using .Net 3.5 . Files.ReadLines() not exists in .Net 3.5

Wasif Hossain
  • 3,900
  • 1
  • 18
  • 20
shujaat siddiqui
  • 1,527
  • 1
  • 20
  • 41
  • 4
    http://stackoverflow.com/questions/4619735/how-to-read-last-n-lines-of-log-file http://stackoverflow.com/questions/398378/get-last-10-lines-of-very-large-text-file-10gb-c-sharp/398512#398512 – Ofiris Feb 19 '14 at 07:20

4 Answers4

4

You can Use File.ReadLines() Method instead of using File.ReadAllLines()

From MSDN:File.ReadLines()

The ReadLines and ReadAllLines methods differ as follows:
When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array.

Therefore, when you are working with very large files, ReadLines can be more efficient.

Solution 1 :

        string[] lines = File.ReadAllLines(FileName).Where(r => r.ToString() != "").ToArray();

        int count = lines.Count();
        List<String> orderList = new List<String>();
        int loopCount = count > 30000 ? 30000 : 0;

        for (int i = count-1; i > loopCount; i--)
        {
            string[] columns = lines[i].Split(',');
            orderList.Add(columns[2]);
        }

Solution 2: if you are using .NET Framework 3.5 as you said in comments below , you can not use File.ReadLines() method as it is avaialble since .NET 4.0 .

You can use StreamReader as below:

        List<string> lines = new List<string>();
        List<String> orderList = new List<String>();
        String line;
        int count=0;
        using (StreamReader reader = new StreamReader("c:\\Bethlehem-Deployment.txt"))
        {
            while ((line = reader.ReadLine()) != null)
            {
                lines.Add(line);
                count++;
            }
        }

        int loopCount = (count > 30000) ? 30000 : 0;

        for (int i = count-1; i > loopCount; i--)
        {
            string[] columns = lines[i].Split(',');
            orderList.Add(columns[0]);
        }
Sudhakar Tillapudi
  • 25,935
  • 5
  • 37
  • 67
2

You can use File.ReadLines by you can start enumerating the collection of strings before the whole collection is returned.

After that you can use the linq to make things lot more easier. Reverse will reverse the order of collection and Take will take the n number of items. Now put again Reverse to get the last n lines in original format.

var lines = File.ReadLines(Filename).Reverse().Take(30000).Reverse();

If you are using the .NET 3.5 or earlier you can create your own method which works same as File.ReadLines like this. Here is the code for the method originally written by @Jon

public IEnumerable<string> ReadLines(string file)
{
   using (TextReader reader = File.OpenText(file))
   {
      string line;
      while ((line = reader.ReadLine()) != null)
      {
         yield return line;
      }
   }
}

Now you can use linq over this function as well like the above statement.

var lines = ReadLines(Filename).Reverse().Take(30000).Reverse();
Community
  • 1
  • 1
Sachin
  • 40,216
  • 7
  • 90
  • 102
1

The problem is that you do not know where to start reading the file to get the last 30,000 lines. Unless you want to maintain a separate index of line offsets you can either read the file from the start counting lines only retaining the last 30,000 lines or you can start from the end counting lines backwards. The last approach can be efficient if the file is very large and you only want a few lines. However, 30,000 does not seem like "a few lines" so here is an approach that reads the file from the start and uses a queue to keep the last 30,000 lines:

var filename = @" ... ";
var linesToRead = 30000;
var queue = new Queue<String>();
using (var streamReader = File.OpenText(fileName)) {
  while (!streamReader.EndOfStream) {
    queue.Enqueue(streamReader.ReadLine());
    if (queue.Count > linesToRead)
      queue.Dequeue();
  }
}

Now you can access the lines that are stored in queue. This class implements IEnumerable<String> allowing you to use foreach to iterate the lines. However, if you want random access you will have to use the ToArray method to convert the queue into an array which adds some overhead to the computation.

This solution is efficient in terms memory because at most 30,000 lines has to be kept in memory and the garbage collector can free any extra lines when required. Using File.ReadAllLines will pull all the lines into memory at once possibly increasing the memory required by the process.

Martin Liversage
  • 104,481
  • 22
  • 209
  • 256
0

Or I have a diffrent ideo for this.

Try splitting the csv to categories like A-D , E-G .... and acces what first character you need .

Or you can split data with count of entites. Every file will contain 15.000 entites for example. And a text file which will contain tiny data about entits and location Like :

Txt File:

entitesID | inWhich.Csv
....
TC Alper Tokcan
  • 369
  • 1
  • 13