93

Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:

cat * | wc -l

However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?

madth3
  • 7,275
  • 12
  • 50
  • 74
speciousfool
  • 2,620
  • 5
  • 28
  • 33

11 Answers11

166

First you do not need to use cat to count lines. This is an antipattern called Useless Use of Cat (UUoC). To count lines in files in the current directory, use wc:

wc -l * 

Then the find command recurses the sub-directories:

find . -name "*.c" -exec wc -l {} \;
  • . is the name of the top directory to start searching from

  • -name "*.c" is the pattern of the file you're interested in

  • -exec gives a command to be executed

  • {} is the result of the find command to be passed to the command (here wc-l)

  • \; indicates the end of the command

This command produces a list of all files found with their line count, if you want to have the sum for all the files found, you can use find to list the files (with the -print option) and than use xargs to pass this list as argument to wc-l.

find . -name "*.c" -print | xargs wc -l 

EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0 option instead of -print and xargs -null so that the list of file names are exchanged with null-terminated strings.

find . -name "*.c" -print0 | xargs -0 wc -l

The Unix philosophy is to have tools that do one thing only, and do it well.

Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
philant
  • 34,748
  • 11
  • 69
  • 112
  • 5
    Seconded. Wanted to point out the UUoC (Useless Use of Cat), but didn't. – ayaz Nov 25 '08 at 07:48
  • 1
    I think that the particular challenge is to get the *total* line count for an entire tree of files. Is there a way to do that simply using the find command? – chromakode Nov 25 '08 at 07:49
  • Your xargs example is almost identical to what I originally came up with by it doesn't handle filenames with spaces in them. – Robert Gamble Nov 25 '08 at 08:00
  • +1, Very nice, and even shorter in zsh: "wc -l **/*.c" – orip Sep 24 '09 at 15:34
  • the comment got messed up: it's "wc -l star star / star .c" – orip Sep 24 '09 at 15:34
  • 3
    The "find ... -print0 | xargs -0 ..." trick is worth committing to memory. – detly May 17 '10 at 15:53
  • 1
    I've also had success with `xargs -d\\n` for files with spaces. That's particularly helpful when there are steps between find and xargs, or when running xargs on the result of an entirely different command. However, it still fails against files with newlines... – eswald Dec 01 '10 at 21:40
  • 6
    Not only is the OP's use of cat not useless, but pipelines like `cat * | wc -l` are **exactly** what `cat` (short for con-*cat*-enate) is designed to do. "Useless use of cat" is when `cat` is used to read a *single* file and pipe it to a program, instead of using an input redirection. If you don't believe that `cat` is not useless here, just try `wc -l *` and `cat * | wc -l`, and observe that their output is different. – user4815162342 Apr 28 '13 at 11:50
  • I propose a second anti-pattern: UUoX... `find . -name "*.c" -exec wc -l {} +` will correctly cater for whitespace-containing filenames and pass *all* filenames to `wc -l`, giving you per-file sum and grand total at the end. – sanmiguel May 27 '14 at 14:02
  • 1
    xargs proves more useful if the command you want to invoke is parallelizable, because xargs has a '-P' parameter that is not available to find, which spawns N jobs. Its also worth noting that 'xargs' is one of the few binaries other than 'find' in the 'findutils' package, and their design is clearly such that they're intended to be used together. "The xargs program builds and executes command lines by gathering together arguments it reads on the standard input. Most often, these arguments are lists of file names generated by find. " -- gnu findutils – Kent Fredric May 28 '14 at 01:40
  • 2
    I'm surprised with all the talk of anti-patterns that nobody has mentioned the anti-pattern of using a count of lines of code as a specious pseudo-metric.... – twalberg May 28 '14 at 19:33
  • This solution may be inaccurate on large code bases as it will chunk files together and count the lines in each chunk without summarising at the end. @aaron-digulla provides an answer using `cat` that avoids this problem. – Rangi Keen May 06 '15 at 15:46
  • UUoC! Excellent! :) – IT_puppet_master Jan 19 '17 at 19:41
31

If you want a code-golfing answer:

grep '' -R . | wc -l 

The problem with just using wc -l on its own is it cant descend well, and the oneliners using

find . -exec wc -l {} \;

Won't give you a total line count because it runs wc once for every file, ( loL! ) and

find . -exec wc -l {} + 

Will get confused as soon as find hits the ~200k1,2 character argument limit for parameters and instead calls wc multiple times, each time only giving you a partial summary.

Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.

For the cost of 1 extra command character, you can ignore binary files completely:

 grep '' -IR . | wc -l

If you want to run line counts on binary files too

 grep '' -aR . | wc -l 
Footnote on limits:

The docs are a bit vague as to whether its a string size limit or a number of tokens limit.

cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar @ARGV, length join q[ ], @ARGV' {} + 
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435

This implies its going to chunk very very easily.

Community
  • 1
  • 1
Kent Fredric
  • 56,416
  • 14
  • 107
  • 150
  • 1
    TIL 32000 file argument limit to `-exec cmd {} +` – sanmiguel May 27 '14 at 14:07
  • There's a limit around there for all commands. OS Level restriction. http://www.cyberciti.biz/faq/linux-unix-arg_max-maximum-length-of-arguments/ – Kent Fredric May 28 '14 at 17:51
  • 1
    @sanmiguel , just updated post, seems that limit is *MUCH* lower in terms of arguments than I thought. I have **git** repos with more than enough files to trip that limit. – Kent Fredric May 28 '14 at 18:10
  • `cat **/* | wc -l` is a few characters shorter :) And it also ignores hidden files and folders (e.g. files in `.git`) which might be beneficial. – psmith Dec 09 '16 at 02:23
13

I think you're probably stuck with xargs

find -name '*php' | xargs cat | wc -l

chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as find starts finding.

Good explanation at Linux: xargs vs. exec {}

Community
  • 1
  • 1
Ken
  • 77,016
  • 30
  • 84
  • 101
  • but unfortunately, you won't get multi-threading goodness there because the pipe makes them all share the same processing line. – Kent Fredric Nov 25 '08 at 08:04
  • 1
    oh, and fyi, that article is bunk. -exec cmd {} + bundles filenames. xargs has the "-1" parameter as well to emulate finds other behaviour. – Kent Fredric Nov 25 '08 at 08:15
  • Thanks Kent. Can you point me at any documentation on "-1"? I'm using GNU xargs version 4.2.32 and can see nothing in the man page. – Ken Nov 25 '08 at 08:21
  • sorry, "-l" which is the limiter. -l[max-lines] ( minor brainfart ) – Kent Fredric Nov 25 '08 at 08:31
  • I did some rigourous tests, Xargs is still faster on all fronts, i posted this to that page, http://rafb.net/p/HLOs3385.html – Kent Fredric Nov 25 '08 at 08:54
12

Try using the find command, which recurses directories by default:

find . -type f -execdir cat {} \; | wc -l

chromakode
  • 357
  • 2
  • 5
  • Much faster just to pipe it through xargs – Ken Nov 25 '08 at 07:53
  • I believe it :) I try to do as little shell scripting as possible, so the more clever `xargs` approach escaped me. Thanks for teaching me something! – chromakode Nov 25 '08 at 07:55
10

The correct way is:

find . -name "*.c" -print0 | xargs -0 cat | wc -l

You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.

You might also want to use "-type f" instead of -name to limit the search to files.

Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820
  • no that's not right. xargs could execute the wc serveral times, resulting in more than one result for different set of files. you should use cat, and at the end pipe into one wc like Ken showed – Johannes Schaub - litb Nov 25 '08 at 12:31
  • You're right. I made xargs call cat (instead of wc) and then pipe the result through wc. – Aaron Digulla Nov 25 '08 at 16:13
8

Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:

wc -l --files0-from=<(find . -name \*.c -print0)

This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.

Idelic
  • 14,976
  • 5
  • 35
  • 40
3
wc -cl `find . -name "*.php" -type f`
pascalhein
  • 5,700
  • 4
  • 31
  • 44
PMD
  • 31
  • 1
2

I like to use find and head together for "a recursively cat" on all the files in a project directory, for example:

find . -name "*rb" -print0 | xargs -0 head -10000

The advantage is that head will add your the filename and path:

==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"

package "mysql-server-5.5"
...

==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...

==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH 

For the complete example here, please see my blog post :

http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/

Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.

Dave Pitts
  • 23
  • 5
  • I appreciate the original question was for a UUoC (Useless Use of Cat) around wc. But I want to share this "recursive use of cat with headers" as I think it is a handy trick/tip. – Dave Pitts Apr 28 '13 at 11:13
  • You can also do `head -n-0`, which will emit *all* lines. `-n, --lines=[-]K print the first K lines instead of the first 10; with the leading '-', print all but the last K lines of each file` – Kent Fredric May 28 '14 at 01:45
1

If you want to generate only a total line count and not a line count for each file something like:

find . -type f -exec wc -l {} \; | awk '{total += $1} END{print total}'

works well. This saves you the need to do further text filtering in a script.

abalmos
  • 11
  • 3
0

Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".

# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"

countLines(){
  # $total is the total lines of code counted
  total=0
  # -mindepth exclues the current directory (".")
  for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
    # First sed: only count lines of code that are not commented with //
    # Second sed: don't count blank lines
    # $numLines is the lines of code
    numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
    total=$(($total + $numLines))
    echo "  " $numLines $file
  done
  echo "  " $total in total
}

echo Source code files:
countLines
echo Unit tests:
cd spec
countLines

Here's what the output looks like for my project:

Source code files:
   2 ./buildDocs.sh
   24 ./countLines.sh
   15 ./css/dashboard.css
   53 ./data/un_population/provenance/preprocess.js
   19 ./index.html
   5 ./server/server.js
   2 ./server/startServer.sh
   24 ./SpecRunner.html
   34 ./src/computeLayout.js
   60 ./src/configDiff.js
   18 ./src/dashboardMirror.js
   37 ./src/dashboardScaffold.js
   14 ./src/data.js
   68 ./src/dummyVis.js
   27 ./src/layout.js
   28 ./src/links.js
   5 ./src/main.js
   52 ./src/processActions.js
   86 ./src/timeline.js
   73 ./src/udc.js
   18 ./src/wire.js
   664 in total
Unit tests:
   230 ./ComputeLayoutSpec.js
   134 ./ConfigDiffSpec.js
   134 ./ProcessActionsSpec.js
   84 ./UDCSpec.js
   149 ./WireSpec.js
   731 in total

Enjoy! --Curran

curran
  • 1,261
  • 13
  • 8
0
find . -name "*.h" -print | xargs wc -l
SD.
  • 1,432
  • 22
  • 38
  • Some comments as to how this code is a solution to the problem would help. – pdobb May 27 '14 at 13:22
  • @pdobb: `find` will search recursively all .h files from current path. and list of files found will be given as input to xargs via '|'(pipe). `xargs` which reads the founded files and converts each line into space separated arguments to the command `wc`, which actually perform counting of lines in file. – SD. May 29 '14 at 07:24