25

Obviously one could loop through a file using fgetl or similar function and increment a counter, but is there a way to determine the number of lines in a file without doing such a loop?

Amro
  • 123,847
  • 25
  • 243
  • 454
robguinness
  • 16,266
  • 14
  • 55
  • 65

6 Answers6

35

I like to use the following code for exactly this task

fid = fopen('someTextFile.txt', 'rb');
%# Get file size.
fseek(fid, 0, 'eof');
fileSize = ftell(fid);
frewind(fid);
%# Read the whole file.
data = fread(fid, fileSize, 'uint8');
%# Count number of line-feeds and increase by one.
numLines = sum(data == 10) + 1;
fclose(fid);

It is pretty fast if you have enough memory to read the whole file at once. It should work for both Windows- and Linux-style line endings.

Edit: I measured the performance of the answers provided so far. Here is the result for determining the number of lines of a text file containing 1 million double values (one value per line). Average of 10 tries.

 Author           Mean time +- standard deviation (s)
------------------------------------------------------
 Rody Oldenhuis      0.3189 +- 0.0314
 Edric (2)           0.3282 +- 0.0248
 Mehrwolf            0.4075 +- 0.0178
 Jonas               1.0813 +- 0.0665
 Edric (1)          26.8825 +- 0.6790

So fastest are the approaches using Perl and reading all the file as binary data. I would not be surprised, if Perl internally also read large blocks of the file at once instead of looping through it line by line (just a guess, do not know anything about Perl).

Using a simple fgetl()-loop is by a factor of 25-75 slower than the other approaches.

Edit 2: Included Edric's 2nd approach, which is much faster and on-par with the Perl solution, I'd say.

Mehrwolf
  • 8,208
  • 2
  • 26
  • 38
  • Thanks! Although all were good answers, I'm picking Mehrwolf's as the accepted one, since he compares all the other answers. I will probably actually use Edric's 2nd answer because I prefer to keep everything inside Matlab. – robguinness Aug 30 '12 at 06:11
  • 1
    It should be noted that `Edric (2)` will be off by one in the event that the final line is not terminated with `\n`. For example, `countLines('countLines.m')` returns 8 when there are 9 lines in the file. While your increase by 1 accounts for this in most cases, it returns a result inconsistent with the system command (on Windows, at least, can't test Linux) when the final line is blank. See [this gist](https://gist.github.com/sco1/986643b64afd3ed78a43fcc5a9decf44) for a MCVE of both cases. – sco1 Jul 25 '16 at 17:41
16

I think a loop is in fact the best - all other options so far suggested either rely on external programs (need to error-check; need str2num; harder to debug / run cross-platform etc.) or read the whole file in one go. Loops aren't so bad. Here's my variant

function count = countLines(fname)
  fh = fopen(fname, 'rt');
  assert(fh ~= -1, 'Could not read: %s', fname);
  x = onCleanup(@() fclose(fh));
  count = 0;
  while ischar(fgetl(fh))
    count = count + 1;
  end
end

EDIT: Jonas rightly points out that the above loop is really slow. Here's a faster version.

function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(@() fclose(fh));
count = 0;
while ~feof(fh)
    count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
end
end

It's still not as fast as wc -l, but it's not a disaster either.

Edric
  • 23,676
  • 2
  • 38
  • 40
  • 1
    The problem with the loop is that you need to access the file at each iteration. File access is notoriously slow in Matlab; doing it many times in a loop is going to hurt. – Jonas Aug 29 '12 at 13:15
  • 1
    It should be noted that the second method will be off by one in the event that the final line is not terminated with `\n`. For example, `countLines('countLines.m')` returns 8 when there are 9 lines in the file. – sco1 Jul 25 '16 at 17:35
  • See [this gist](https://gist.github.com/sco1/986643b64afd3ed78a43fcc5a9decf44) for a MCVE – sco1 Jul 25 '16 at 17:49
12

I found a nice trick here:

if (isunix) %# Linux, mac
    [status, result] = system( ['wc -l ', 'your_file'] );
    numlines = str2num(result);

elseif (ispc) %# Windows
    numlines = str2num( perl('countlines.pl', 'your_file') );

else
    error('...');

end

where 'countlines.pl' is a perl script, containing

while (<>) {};
print $.,"\n";
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • 2
    `system( ['wc -l ', 'your_file'] )` will also output the filename into `result`. This can be avoided by using `system( ['wc -l <', 'your_file'] )`. – Robairto Dec 02 '16 at 17:40
4

You can read the entire file at once, and then count how many lines you've read.

fid = fopen('yourFile.ext');

allText = textscan(fid,'%s','delimiter','\n');

numberOfLines = length(allText{1});

fclose(fid)
Jonas
  • 74,690
  • 10
  • 137
  • 177
  • This could give memory issues for large files, since `allText` will have to contain, well, *all text* in the file. – Rody Oldenhuis Aug 29 '12 at 11:17
  • @RodyOldenhuis: Yes, memory is certainly an issue. How much memory does your Perl solution require? Does it read the file line-by-line, in chucks, or at whole? – Mehrwolf Aug 29 '12 at 13:07
0

I would recommend using an external tool for this. For example an app called cloc, which you can download here for free.

On linux you then simply type cloc <repository path> and get

YourPC$ cloc <directory_path>
      87 text files.
      81 unique files.                              
      23 files ignored.

http://cloc.sourceforge.net v 1.60  T=0.19 s (311.7 files/s, 51946.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
MATLAB                          59           1009           1074           4993
HTML                             1              0              0             23
-------------------------------------------------------------------------------
SUM:                            60           1009           1074           5016
-------------------------------------------------------------------------------

They also claim it should work on windows.

Ufos
  • 3,083
  • 2
  • 32
  • 36
0

The issue with the miscounting of lines in Edric’s answer can be solved with this.

 function count = countlines(fname)
    fid = fopen(fname, 'r');
    assert(fid ~= -1, 'Could not read: %s', fname);
    x = onCleanup(@() fclose(fid));
    count = 0;
    % while ~feof(fid)
    %     count = count + sum( fread( fid, 16384, 'char' ) == char(10) );
    % end
    while ~feof(fid)
        [~] = fgetl(fid);
        count = count + 1;
    end
end
Michael
  • 15
  • 4