9

I have a directory with thousands of files (100K for now). When I use wc -l ./*, I'll get:

 c1            ./test1.txt
 c2            ./test2.txt
 ...
 cn            ./testn.txt
 c1+c2+...+cn  total

Because there are a lot of files in the directory, I just want to see the total count and not the details. Is there any way to do so?

I tried several ways and I got following error:
Argument list too long

Shannon
  • 985
  • 3
  • 11
  • 25
  • The quick and simple solution would probably be `ls -l | wc -l`. – Guest Oct 31 '17 at 01:34
  • 1
    @Guest, thanks for your response. I tried it, it shows n instead of `c1+c2+...+cn`. – Shannon Oct 31 '17 at 01:45
  • 3
    `cat * | wc -l`, maybe? I'm pretty sure this is a duplicate, tough. – Benjamin W. Oct 31 '17 at 02:12
  • @BenjaminW. It works for a small number of files in a directory. I have a lot of files in my directory, so I get an error of "Argument list too long" – Shannon Oct 31 '17 at 04:29
  • Does this answer your question? [How can I count all the lines of code in a directory recursively?](https://stackoverflow.com/questions/1358540/how-can-i-count-all-the-lines-of-code-in-a-directory-recursively) – Josh Correia Oct 13 '20 at 19:52

7 Answers7

12

If what you want is the total number of lines and nothing else, then I would suggest the following command:

cat * | wc -l

This catenates the contents of all of the files in the current working directory and pipes the resulting blob of text through wc -l.

I find this to be quite elegant. Note that the command produces no extraneous output.

UPDATE:

I didn't realize your directory contained so many files. In light of this information, you should try this command:

for file in *; do cat "$file"; done | wc -l

Most people don't know that you can pipe the output of a for loop directly into another command.

Beware that this could be very slow. If you have 100,000 or so files, my guess would be around 10 minutes. This is a wild guess because it depends on several parameters that I'm not able to check.

If you need something faster, you should write your own utility in C. You could make it surprisingly fast if you use pthreads.

Hope that helps.

LAST NOTE:

If you're interested in building a custom utility, I could help you code one up. It would be a good exercise, and others might find it useful.

lifecrisis
  • 346
  • 2
  • 12
  • The only *nit* is `cat *` will not capture hidden (dot) files. A `cat * .[^.]*` will get both. – David C. Rankin Oct 31 '17 at 03:53
  • @lifecrisis it works for a small number of files in a directory. I have a lot of files in my directory, so I get an error of "Argument list too long" – Shannon Oct 31 '17 at 04:22
  • @DavidC.Rankin, this would be useful if the question had asked for the inclusion of dotfiles. Note that you should add the pattern ```..?*``` to your command. As it stands, your patterns will not match files such as ```..file```. – lifecrisis Oct 31 '17 at 16:25
  • When you answer questions on S.O., you step into the roll of the teacher. Better answers will explain the nuances of the answer and potential shortcomings of one method compared to another. This facilitates learning. While the question did not explicitly ask for dotfiles, it did not explicitly excluded them either. As written your answer is a partial-answer to how to sum lines in all files within a directory. – David C. Rankin Oct 31 '17 at 16:28
  • 1
    @lifecrisis thanks for the complete explanations and the update. The system is down for a few days. I will try it when it starts working and will let you know. – Shannon Nov 01 '17 at 19:36
  • @Shabnam, did you find that it was especially slow? – lifecrisis Nov 04 '17 at 15:16
5

Credit: this builds on @lifecrisis's answer, and extends it to handle large numbers of files:

find . -maxdepth 1 -type f -exec cat {} + | wc -l

find will find all of the files in the current directory, break them into groups as large as can be passed as arguments, and run cat on the groups.

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • Davission, can you please explain it briefly? What is "."? Does it mean searching in the current directory? – Shannon Oct 31 '17 at 08:14
  • @Shabnam Yes, "." refers to the current directory. See ["Regarding the Single and the Double Dot within Directories"](https://stackoverflow.com/questions/3479744/regarding-the-single-and-the-double-dot-within-directories). – Gordon Davisson Oct 31 '17 at 14:30
  • 1
    I give this one a thumbs-up. It's quite fast and handles the load better than my `for` loop suggested above. I also didn't realize that `find` would break files into groups for you. This is a good thing to know! – lifecrisis Oct 31 '17 at 16:50
  • 2
    @lifecrisis Yep, it's a handy feature of `find`. Note that `-exec cmd {} +` will run files in batches, while `-exec cmd \;` will run them one at a time. The `+` behavior is very similar to `xargs`. – Gordon Davisson Nov 01 '17 at 00:59
  • @GordonDavisson thanks for the explanation. The system is down for a few days. I will try it when it starts working and will let you know – Shannon Nov 01 '17 at 19:38
4
awk 'END {print NR" total"}' ./*

Would be an interesting comparison to find out how many lines don't end with a new line.

Combining the awk and Gordon’s find solutions and avoiding the "." files.

find ./* -maxdepth 0 -type f -exec awk 'END {print NR}' {} +

No idea if this is better or worse but it does give a more accurate count (for me) and does not count lines in "." files. Using ./* is just a guess that appears to work.

Still need depth and ./* requires "0" depth.

I did get the same result with the "cat" and "awk" solutions (using the same find) since the "cat *" takes care of the new line issue. I don't have a directory with enough files to measure time. Interesting, I'm liking the "cat" solution.

Meemo4556
  • 25
  • 1
  • 5
JDQ
  • 111
  • 4
  • Many ways to do this... first thought is to pipe a “tail - 1” or a “grep total”on your wc, second thought is awk would be more accurate since wc only counts lines ending in a newline. – JDQ Oct 31 '17 at 03:01
  • It’s a duplicate in many places. There are many different ways to do this. I found I had six more lines with the awk solution than with any of the wc solutions in my desktop directory. – JDQ Oct 31 '17 at 03:22
  • @Gordon Davission: I tried it, but I get following error: "Argument list too long" – Shannon Oct 31 '17 at 04:27
  • Sounds like you will need a script to loop through all the files. There’s only so much you can do with a command line. How many files? – JDQ Oct 31 '17 at 04:40
  • 100K for now, but it may be more – Shannon Oct 31 '17 at 05:18
  • These are the wrong quotes around your awk command, `’` instead of `'`. – Benjamin W. Oct 31 '17 at 13:35
  • Sorry, on a phone with fat fingers and bad eyes. – JDQ Oct 31 '17 at 13:39
1

This will give you the total count for all the files (including hidden files) in your current directory :

$ find . -maxdepth 1 -type f  | xargs wc -l  | grep total
 1052 total

To count for files excluding hidden files use :

find . -maxdepth 1 -type f  -not -path "*/\.*"  | xargs wc -l  | grep total
Rahul Verma
  • 2,946
  • 14
  • 27
1

(Apologies for adding this as an answer—but I do not have enough reputation for commenting.)

A comment on @lifecrisis's answer. Perhaps cat is slowing things down a bit. We could replace cat by wc -l and then use awkto add the numbers. (This could be faster since much less data needs to go throught the pipe.)

That is

for file in *; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'

instead of

for file in *; do cat "$file"; done | wc -l

(Disclaimer: I am not incorporating many of the improvements in other answers, but I thought the point was valid enough to write down.)

Here are my results for comparison (I ran the newer version first so that any cache effects would go against the newer candidate).

$ time for f in `seq 1 1500`; do head -c 5M </dev/urandom >myfile-$f |sed -e 's/\(................\)/\1\n/g'; done

real    0m50.360s
user    0m4.040s
sys 0m49.489s

$ time for file in myfile-*; do wc -l "$file"; done | awk '{sum += $1} END {print sum}'
30714902

real    0m3.455s
user    0m2.093s
sys 0m1.515s

$ time for file in myfile-*; do cat "$file"; done | wc -l
30714902

real    0m4.481s
user    0m2.544s
sys 0m4.312s
Tássio
  • 111
  • 4
0

Below command will provide the total count of lines from all files in path

for i in    `ls- ltr | awk ‘$1~”^-rw”{print $9}’`; do wc -l $I | awk ‘{print $1}’; done >>/var/tmp/filelinescount.txt  
Cat /var/tmp/filelinescount.txt| sed -r “s/\s+//g”|tr “\n” “+”| sed “s:+$::g”| sed ’s/^/“/g’| sed ’s/$/“/g’ | awk ‘{print “echo” “ “ $0”+bc”}’| sh
Rao
  • 20,781
  • 11
  • 57
  • 77
user8854776
  • 173
  • 1
  • 7
0

iF you want to know only total number Lines in directory excluding total line

ls -ltr | sed -n '/total/!p' | awk '{print NR}'

Previous comment will give total count of lines which includes only count of lines in all files

Rao
  • 20,781
  • 11
  • 57
  • 77
user8854776
  • 173
  • 1
  • 7