1

Usually I use wc -l to count the lines of a file. However for a file with 5*10^7 lines, I get only 10^7 as an answer. I've tried everything proposed here here: How to count lines in a document? But it takes to much time than wc -l.

Is there any other option?

Community
  • 1
  • 1
ziulfer
  • 1,339
  • 5
  • 18
  • 30
  • where did you get the file, is there DOS/UNIX line break problem? – Kent Jun 11 '14 at 09:47
  • 3
    `awk 'END {print NR}'` –  Jun 11 '14 at 09:55
  • @Jidder i think awk has some limit. – Avinash Raj Jun 11 '14 at 09:56
  • @Avinash Raj Is there , i know there are limits for NF and characters per record but ive never had a problem with NR. I just check on a million line file and it worked. Never had one 50 millions line before though. –  Jun 11 '14 at 10:00
  • @Kent. The file was created during a MD (molecular dynamics) simulations. I'm almost sure that there is no problem in the file. – ziulfer Jun 11 '14 at 10:01
  • Possible duplicate: [Count how many lines in large files](http://stackoverflow.com/q/12716570/1983854) (not marking it as so, because would close it automatically). I've doing some checkings and `wc -l` is way faster than other alternatives. With 10^7 lines, it took `0m0.089s`, while `awk's END{}` was 2nd with `0m0.404s`. Then `sed` with `0m1.101s` and finally `nl` with more than 3s. – fedorqui Jun 11 '14 at 10:24
  • You should file a bug with GNU Coreutils. – Francisco Jun 11 '14 at 11:17

5 Answers5

3

Anyone serious about speed line counting can just create their own implementation:

#include <stdio.h>
#include <string.h>
#include <fcntl.h>

#define BUFFER_SIZE (1024 * 16)
char BUFFER[BUFFER_SIZE];

int main(int argc, char** argv) {
    unsigned int lines = 0;
    int fd, r;

    if (argc > 1) {
        char* file = argv[1];
        if ((fd = open(file, O_RDONLY)) == -1) {
            fprintf(stderr, "Unable to open file \"%s\".\n", file);
            return 1;
        }
    } else {
        fd = fileno(stdin);
    }

    while ((r = read(fd, BUFFER, BUFFER_SIZE)) > 0) {
        char* p = BUFFER;
        while ((p = memchr(p, '\n', (BUFFER + r) - p))) {
            ++p;
            ++lines;
        }
    }

    close(fd);

    if (r == -1) {
        fprintf(stderr, "Read error.\n");
        return 1;
    }

    printf("%d\n", lines);

    return 0;
}

Usage

a < input
... | a
a file

Example:

# time ./wc temp.txt
10000000

real    0m0.115s
user    0m0.102s
sys     0m0.014s

# time wc -l temp.txt
10000000 temp.txt

real    0m0.120s
user    0m0.103s
sys     0m0.016s

  *   Code compiled with -O3 natively on a system with AVX and SSE4.2 using GCC 4.8.2.

konsolebox
  • 72,135
  • 12
  • 99
  • 105
2

You could try sed

sed -n '$=' file

The = says to print the line number, and the dollar says to only do it on the last line. The -n says not to do too much else.

Or here's a way in Perl, save this as wc.pl and do chmod +x wc.pl:

#!/usr/bin/perl
use strict;
use warnings;

    my $filename = <@ARGV>;
    my $lines = 0;
    my $buffer;
    open(FILE, $filename) or die "ERROR: Can not open file: $!";
    while (sysread FILE, $buffer, 65536) {
        $lines += ($buffer =~ tr/\n//);
    }
    close FILE;
    print "$lines\n";

Run it like this:

wc.pl yourfile

Basically it reads your file in in chunks of 64kB at a time and then takes advantage of the fact that tr returns the number of substitutions it has made after asking it to delete all newlines.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
1

Try with nl and see what happens...

Vishkey
  • 197
  • 6
1

You can get the line count using awk as well like below

awk 'END {print NR}' names.txt

(OR) Using while .. do .. done bash loop construct like

CNT=0; while read -r LINE; do (( CNT++ )); done < names.txt; echo $CNT
Rahul
  • 76,197
  • 13
  • 71
  • 125
0

Depends on how you open the file, but probably reading it from STDIN instead would get the fix:

wc -l < file
konsolebox
  • 72,135
  • 12
  • 99
  • 105