1

My C++ program needs to know how many lines are in a certain text file. I could do it with getline() and a while-loop, but is there a better way?

neuromancer
  • 53,769
  • 78
  • 166
  • 223
  • 2
    i think `getline()` is the way to go – knittl May 10 '10 at 07:42
  • You cannot avoid reading the entire file. There are some non-portable optimizations possible, depending on platform, but `getline()` is fine. – peterchen May 10 '10 at 08:38
  • What would be interesting would be to compare I guess, notably I wonder about the buffering strategy used by `ifstream`: I would suppose that less disk access is better and thus large chunks would be the way to go; but I have no idea of how large the buffer gets or even if it's possible to parameterize it. – Matthieu M. May 10 '10 at 09:07

6 Answers6

4

No.

Not unless your operating system's filesystem keeps track of the number of lines, which your system almost certainly doesn't as it's been a looong time since I've seen that.

msw
  • 42,753
  • 9
  • 87
  • 112
2

By "another way", do you mean a faster way? No matter what, you'll need to read in the entire contents of the file. Reading in different-sized chunks shouldn't matter much since the OS or the underlying file libraries (or both) are buffering the file contents.

getline could be problematic if there are only a few lines in a very large file (high transient memory usage), so you might want to read in fixed-size 4KB chunks and process them one-by-one.

Chris Schmich
  • 29,128
  • 5
  • 77
  • 94
1

Iterate the file char-by-char with get(), and for each newline (\n) increment line number by one.

reko_t
  • 55,302
  • 10
  • 87
  • 77
  • That method is worse than the one I was talking about. I'm trying to avoid reading the whole file in. – neuromancer May 10 '10 at 07:41
  • @knittl: how do you know ? ever heard of premature optimisation ? – Paul R May 10 '10 at 07:45
  • @Phenom: no - the char-by-char method and getline method do exactly the same thing - they read the entire file looking for end of line characters – Paul R May 10 '10 at 07:46
  • 2
    It'll be faster than `getline()`. The fastest way would be to `mmap()` the file and then count `\n`s. – Andrew McGregor May 10 '10 at 07:47
  • @Phenom RE: "avoid reading the whole file in" - unless you've preprocessed or otherwise have some index of these files, you will have to read all of the contents of the file. You won't necessarily have to have the entire file in memory, but at some point you will read every byte of the file. – Chris Schmich May 10 '10 at 07:54
  • @knittl: note that `getline()` may fail if you have excessively long lines - you will need extra code to handle this case, so the getchar approach may actually be more readable – Paul R May 11 '10 at 08:09
1

The fastest, but OS-dependent way would be to map the whole file to memory (if not possible to map the whole file at once - map it in chunks sequentially) and call std::count(mem_map_begin,mem_map_end,'\n')

catwalk
  • 6,340
  • 25
  • 16
  • links for most common: unix: http://linux.die.net/man/2/mmap windows: http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx – catwalk May 10 '10 at 08:25
  • Why do you think this would be faster than `getline`? – ChrisW May 10 '10 at 08:40
  • @ChrisW: `mmap` is faster than `getline` because it usually avoids extra buffering on standard library level and data movement between kernel and user levels. `getline` could be made more efficient but I think that its designers went for more generic and portable approach rather than for pure speed. – catwalk May 10 '10 at 13:01
0

Don't know if getline() is the best - buffer size is variable at the worst case (sequence of \n) it could read byte after byte in each iteration.

For me It would be better to read a file in a chunks of predetermined size. And than scan for number of new line encodings ( inside. Although there's some risk I cannot / don't know how to resolve: other file encodings than ASCII. If getline() will handle than it's easiest but I don't think it's true.

Some url's:

Why does wide file-stream in C++ narrow written data by default?

http://en.wikipedia.org/wiki/Newline

Community
  • 1
  • 1
XAder
  • 676
  • 4
  • 12
0

possibly fastest way is to use low level read() and scan buffer for '\n':

int clines(const char* fname)
{
    int nfd, nLen;
    int count = 0;
    char buf[BUFSIZ+1];

    if((nfd = open(fname, O_RDONLY)) < 0) {
        return -1;
    }

    while( (nLen = read(nfd, buf, BUFSIZ)) > 0 )
    {
        char *p = buf;
        int n = nLen;
        while( n && (p = memchr(p,'\n', n)) ) {
            p++;
            n = nLen - (p - buf);
            count++;
        }
    }
    close(nfd);
    return count;
}
Oleg Razgulyaev
  • 5,757
  • 4
  • 28
  • 28