Faster and precise way to count lines other than wc -l

Question

Usually I use wc -l to count the lines of a file. However for a file with 5*10^7 lines, I get only 10^7 as an answer. I've tried everything proposed here here: How to count lines in a document? But it takes to much time than wc -l.

Is there any other option?

where did you get the file, is there DOS/UNIX line break problem? — Kent, Jun 11 '14 at 09:47
@Avinash Raj Is there , i know there are limits for NF and characters per record but ive never had a problem with NR. I just check on a million line file and it worked. Never had one 50 millions line before though. — , Jun 11 '14 at 10:00
@Kent. The file was created during a MD (molecular dynamics) simulations. I'm almost sure that there is no problem in the file. — ziulfer, Jun 11 '14 at 10:01
Possible duplicate: [Count how many lines in large files](http://stackoverflow.com/q/12716570/1983854) (not marking it as so, because would close it automatically). I've doing some checkings and `wc -l` is way faster than other alternatives. With 10^7 lines, it took `0m0.089s`, while `awk's END{}` was 2nd with `0m0.404s`. Then `sed` with `0m1.101s` and finally `nl` with more than 3s. — fedorqui, Jun 11 '14 at 10:24

konsolebox · Answer 1 · 2014-06-11T11:34:38.593

Anyone serious about speed line counting can just create their own implementation:

#include <stdio.h>
#include <string.h>
#include <fcntl.h>

#define BUFFER_SIZE (1024 * 16)
char BUFFER[BUFFER_SIZE];

int main(int argc, char** argv) {
    unsigned int lines = 0;
    int fd, r;

    if (argc > 1) {
        char* file = argv[1];
        if ((fd = open(file, O_RDONLY)) == -1) {
            fprintf(stderr, "Unable to open file \"%s\".\n", file);
            return 1;
        }
    } else {
        fd = fileno(stdin);
    }

    while ((r = read(fd, BUFFER, BUFFER_SIZE)) > 0) {
        char* p = BUFFER;
        while ((p = memchr(p, '\n', (BUFFER + r) - p))) {
            ++p;
            ++lines;
        }
    }

    close(fd);

    if (r == -1) {
        fprintf(stderr, "Read error.\n");
        return 1;
    }

    printf("%d\n", lines);

    return 0;
}

Usage

a < input
... | a
a file

Example:

# time ./wc temp.txt
10000000

real    0m0.115s
user    0m0.102s
sys     0m0.014s

# time wc -l temp.txt
10000000 temp.txt

real    0m0.120s
user    0m0.103s
sys     0m0.016s

* Code compiled with -O3 natively on a system with AVX and SSE4.2 using GCC 4.8.2.

Mark Setchell · Answer 2 · 2014-06-11T10:44:18.807

2

You could try sed

sed -n '$=' file

The = says to print the line number, and the dollar says to only do it on the last line. The -n says not to do too much else.

Or here's a way in Perl, save this as wc.pl and do chmod +x wc.pl:

#!/usr/bin/perl
use strict;
use warnings;

    my $filename = <@ARGV>;
    my $lines = 0;
    my $buffer;
    open(FILE, $filename) or die "ERROR: Can not open file: $!";
    while (sysread FILE, $buffer, 65536) {
        $lines += ($buffer =~ tr/\n//);
    }
    close FILE;
    print "$lines\n";

Run it like this:

wc.pl yourfile

Basically it reads your file in in chunks of 64kB at a time and then takes advantage of the fact that tr returns the number of substitutions it has made after asking it to delete all newlines.

edited Jun 11 '14 at 10:44

answered Jun 11 '14 at 10:04

Mark Setchell

191,897
31
273
432

I know that sounds weird, but using this command I have simply no output. I tried for a very small file (7 lines) and it works. – ziulfer Jun 11 '14 at 10:14
I have had a go in Perl too, please have another look. – Mark Setchell Jun 11 '14 at 10:37

score 1 · Answer 3 · answered Jun 11 '14 at 09:47

1

Try with nl and see what happens...

answered Jun 11 '14 at 09:47

Vishkey

197
6

score 1 · Answer 4 · answered Jun 11 '14 at 10:19

1

You can get the line count using awk as well like below

awk 'END {print NR}' names.txt

(OR) Using while .. do .. done bash loop construct like

CNT=0; while read -r LINE; do (( CNT++ )); done < names.txt; echo $CNT

answered Jun 11 '14 at 10:19

Rahul

76,197
13
71
125

score 0 · Answer 5 · answered Jun 11 '14 at 09:41

0

Depends on how you open the file, but probably reading it from STDIN instead would get the fix:

wc -l < file

answered Jun 11 '14 at 09:41

konsolebox

72,135
12
99
105

Faster and precise way to count lines other than wc -l

5 Answers5