234

Is there an easy way to programmatically determine the number of lines within a text file?

H.B.
  • 166,899
  • 29
  • 327
  • 400
TK.
  • 46,577
  • 46
  • 119
  • 147

12 Answers12

442

Seriously belated edit: If you're using .NET 4.0 or later

The File class has a new ReadLines method which lazily enumerates lines rather than greedily reading them all into an array like ReadAllLines. So now you can have both efficiency and conciseness with:

var lineCount = File.ReadLines(@"C:\file.txt").Count();

Original Answer

If you're not too bothered about efficiency, you can simply write:

var lineCount = File.ReadAllLines(@"C:\file.txt").Length;

For a more efficient method you could do:

var lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

Edit: In response to questions about efficiency

The reason I said the second was more efficient was regarding memory usage, not necessarily speed. The first one loads the entire contents of the file into an array which means it must allocate at least as much memory as the size of the file. The second merely loops one line at a time so it never has to allocate more than one line's worth of memory at a time. This isn't that important for small files, but for larger files it could be an issue (if you try and find the number of lines in a 4GB file on a 32-bit system, for example, where there simply isn't enough user-mode address space to allocate an array this large).

In terms of speed I wouldn't expect there to be a lot in it. It's possible that ReadAllLines has some internal optimisations, but on the other hand it may have to allocate a massive chunk of memory. I'd guess that ReadAllLines might be faster for small files, but significantly slower for large files; though the only way to tell would be to measure it with a Stopwatch or code profiler.

Dante May Code
  • 11,177
  • 9
  • 49
  • 81
Greg Beech
  • 133,383
  • 43
  • 204
  • 250
  • Why is the second method less performant than the first? It is not apparent from the code. – Sklivvz Sep 23 '08 at 07:39
  • My gut feel would be the first may be faster, but that's just a guess – johnc Sep 23 '08 at 07:46
  • 2
    Small note: because String is a reference type the array would be the size of the number of lines x the size of a pointer, but you're correct that it still needs to store the text, each line as a single String object. – Mike Dimmick Sep 23 '08 at 11:08
  • Mike - indeed, thanks for clarifying that. That was what I meant but looking again I realise it could potentially read badly. – Greg Beech Sep 23 '08 at 15:03
  • is this .net 4.0 most efficient way ? or any better way with 4.5 ? – Furkan Gözükara Mar 08 '13 at 15:51
  • 16
    FYI: In order to do the `ReadLines().Count()` you will need to add a `using System.Linq` to your includes. It seemed fairly non-intuitive to require that addition, so that's why I mention it. If your are using Visual Studio it's likely this addition is done for you automatically. – Owen Allen May 23 '13 at 16:59
  • 2
    I've tested both approaches, "File.ReadLines.Count()" v/s "reader.ReadLine()" and "reader.ReadLine()" is slightly faster but it's faster by very little margin. "ReadAllLines" is looser which takes double the time and eats lot of memory). This is because "File.ReadLines.Count()" and "reader.ReadLine()" is an enumerator which reads file line by line and doesn't load the whole file in memory read it in RAM again. – Yogee Mar 11 '14 at 18:04
  • Who the hell is going to checking a 4 gb txt file? That is ridiculously huge. Using UTF-8 we are looking at a minimum of 10 billion characters. I find your claims on efficiency to be invalid because in typical practice no one will ever be analyzing a txt file of that size. Not to mention the file could easily be one line with no line breaks causing a crash regardless. – JSON May 11 '16 at 14:01
  • 13
    Yeah, nobody ever works with files 4GB+. We certainly never deal with log files that large. Oh, wait. – Greg Beech May 11 '16 at 22:51
  • 4
    If you want to see the insides of File.ReadLines() go here: [System.IO.File.cs](http://referencesource.microsoft.com/#mscorlib/System/io/file.cs) When you drill down through the overloads it takes you here: [ReadLinesIterator.cs](http://referencesource.microsoft.com/#mscorlib/System/io/ReadLinesIterator.cs) – Steve Kinyon May 12 '16 at 17:53
  • What if I want to count the total lines first, then while reading line by line, show progress(percentage) of reading? – Lei Yang Jun 12 '16 at 07:01
  • "Who the hell is going to checking a 4 gb txt file?" - Nooo! We never process csv files 32GB long – MajesticRa Dec 07 '17 at 16:06
  • [This](https://www.nimaara.com/counting-lines-of-a-text-file/) was a good read about optimization of a line counting operation. – Josh Gust May 21 '21 at 18:40
13

The easiest:

int lines = File.ReadAllLines("myfile").Length;
leppie
  • 115,091
  • 17
  • 196
  • 297
8

This would use less memory, but probably take longer

int count = 0;
string line;
TextReader reader = new StreamReader("file.txt");
while ((line = reader.ReadLine()) != null)
{
  count++;
}
reader.Close();
benPearce
  • 37,735
  • 14
  • 62
  • 96
6

Reading a file in and by itself takes some time, garbage collecting the result is another problem as you read the whole file just to count the newline character(s),

At some point, someone is going to have to read the characters in the file, regardless if this the framework or if it is your code. This means you have to open the file and read it into memory if the file is large this is going to potentially be a problem as the memory needs to be garbage collected.

Nima Ara made a nice analysis that you might take into consideration

Here is the solution proposed, as it reads 4 characters at a time, counts the line feed character and re-uses the same memory address again for the next character comparison.

private const char CR = '\r';  
private const char LF = '\n';  
private const char NULL = (char)0;

public static long CountLinesMaybe(Stream stream)  
{
    Ensure.NotNull(stream, nameof(stream));

    var lineCount = 0L;

    var byteBuffer = new byte[1024 * 1024];
    const int BytesAtTheTime = 4;
    var detectedEOL = NULL;
    var currentChar = NULL;

    int bytesRead;
    while ((bytesRead = stream.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
    {
        var i = 0;
        for (; i <= bytesRead - BytesAtTheTime; i += BytesAtTheTime)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 1];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 2];
                if (currentChar == detectedEOL) { lineCount++; }

                currentChar = (char)byteBuffer[i + 3];
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
                i -= BytesAtTheTime - 1;
            }
        }

        for (; i < bytesRead; i++)
        {
            currentChar = (char)byteBuffer[i];

            if (detectedEOL != NULL)
            {
                if (currentChar == detectedEOL) { lineCount++; }
            }
            else
            {
                if (currentChar == LF || currentChar == CR)
                {
                    detectedEOL = currentChar;
                    lineCount++;
                }
            }
        }
    }

    if (currentChar != LF && currentChar != CR && currentChar != NULL)
    {
        lineCount++;
    }
    return lineCount;
}

Above you can see that a line is read one character at a time as well by the underlying framework as you need to read all characters to see the line feed.

If you profile it as done bay Nima you would see that this is a rather fast and efficient way of doing this.

Antonín Lejsek
  • 6,003
  • 2
  • 16
  • 18
Walter Verhoeven
  • 3,867
  • 27
  • 36
5

If by easy you mean a lines of code that are easy to decipher but per chance inefficient?

string[] lines = System.IO.File.RealAllLines($filename);
int cnt = lines.Count();

That's probably the quickest way to know how many lines.

You could also do (depending on if you are buffering it in)

#for large files
while (...reads into buffer){
string[] lines = Regex.Split(buffer,System.Enviorment.NewLine);
}

There are other numerous ways but one of the above is probably what you'll go with.

Jason Plank
  • 2,336
  • 5
  • 31
  • 40
user8456
  • 346
  • 1
  • 4
  • 12
  • 4
    I argue that this method is very inefficient; because, you're reading the entire file into memory, and into a string array, no less. You don't have to copy the buffer, when using ReadLine. See the answer from @GregBeech. Sorry to rain on your parade. – Mike Christian May 31 '12 at 16:55
2

You could quickly read it in, and increment a counter, just use a loop to increment, doing nothing with the text.

Mitchel Sellers
  • 62,228
  • 14
  • 110
  • 173
1

count the carriage returns/line feeds. I believe in unicode they are still 0x000D and 0x000A respectively. that way you can be as efficient or as inefficient as you want, and decide if you have to deal with both characters or not

geocoin
  • 1,281
  • 1
  • 14
  • 30
1

A viable option, and one that I have personally used, would be to add your own header to the first line of the file. I did this for a custom model format for my game. Basically, I have a tool that optimizes my .obj files, getting rid of the crap I don't need, converts them to a better layout, and then writes the total number of lines, faces, normals, vertices, and texture UVs on the very first line. That data is then used by various array buffers when the model is loaded.

This is also useful because you only need to loop through the file once to load it in, instead of once to count the lines, and again to read the data into your created buffers.

Krythic
  • 4,184
  • 5
  • 26
  • 67
0

Use this:

    int get_lines(string file)
    {
        var lineCount = 0;
        using (var stream = new StreamReader(file))
        {
            while (stream.ReadLine() != null)
            {
                lineCount++;
            }
        }
        return lineCount;
    }
Khalil Youssefi
  • 385
  • 6
  • 10
0

Selected answer is ok for me, but I needed to change var to long for huge text files, so the code looks like this:

long lineCount = 0;
using (var reader = File.OpenText(@"C:\file.txt"))
{
    while (reader.ReadLine() != null)
    {
        lineCount++;
    }
}

In other case int would wrap around from negative values what spoils the count.

Also I am thinking about solution that counts the number of line feeds (LF) in the file - reading binary by 1 MB or 100 MB (depends on memory), not reading it line by line by C# functions.

EDIT:

I have written this code:

var sr = new StreamReader(file);
int rb = 100 * 1024 * 1024;
char[] buf = new char[rb];
int lf = 0;
int taken = 0;
while ((taken = sr.ReadBlock(buf, 0, rb)) != 0)
{
    lf += buf.Take(taken).Count(x => x == '\x0a');
}

Seems like it is not faster...

pbies
  • 666
  • 11
  • 28
-1
try {
    string path = args[0];
    FileStream fh = new FileStream(path, FileMode.Open, FileAccess.Read);
    int i;
    string s = "";
    while ((i = fh.ReadByte()) != -1)
        s = s + (char)i;

    //its for reading number of paragraphs
    int count = 0;
    for (int j = 0; j < s.Length - 1; j++) {
            if (s.Substring(j, 1) == "\n")
                count++;
    }

    Console.WriteLine("The total searches were :" + count);

    fh.Close();

} catch(Exception ex) {
    Console.WriteLine(ex.Message);
}         
Tiago Almeida
  • 14,081
  • 3
  • 67
  • 82
-4

You can launch the "wc.exe" executable (comes with UnixUtils and does not need installation) run as an external process. It supports different line count methods (like unix vs mac vs windows).

Sklivvz
  • 30,601
  • 24
  • 116
  • 172
  • 1
    There is no way this would be fast enough to be useful. The overhead of just calling the executable would be twice as much(obvious exaggeration is obvious) as a single incrementing loop. – Krythic May 20 '16 at 00:37