1002

How would I count the total number of lines present in all the files in a git repository?

git ls-files gives me a list of files tracked by git.

I'm looking for a command to cat all those files. Something like

git ls-files | [cat all these files] | wc -l
Kas Elvirov
  • 7,394
  • 4
  • 40
  • 62
Dogbert
  • 212,659
  • 41
  • 396
  • 397

17 Answers17

1543

xargs will let you cat all the files together before passing them to wc, like you asked:

git ls-files | xargs cat | wc -l

But skipping the intermediate cat gives you more information and is probably better:

git ls-files | xargs wc -l
Rory O'Kane
  • 29,210
  • 11
  • 96
  • 131
Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • This double-counts when you have symbolic links in your repository. Maybe that's not a concern, though. – ephemient Jan 27 '11 at 22:53
  • 16
    I guess trivial; How about include only source code files (eg *.cpp). We have some bin files committed :) – Daniel Sep 05 '12 at 14:25
  • 61
    Stick `grep cpp |` in there before the `xargs`, then. – Carl Norum Sep 05 '12 at 15:18
  • I'd like to mention that the latter (git ls-files |xargs wc -l) works in the github install of git within windows poweshell. – user1816847 Feb 16 '13 at 07:17
  • 47
    Use `git ls-files -z | xargs -0 wc -l` if you have files with spaces in the name. – mpontillo Nov 19 '13 at 04:33
  • 1
    This will also include images. One JPEG image in my repository apparently has 15176 lines of text. – Adam Elsodaney May 06 '14 at 18:40
  • 1
    For future use you can place it in your `~/.gitconfig` as an alias: `count = ! git ls-files | xargs wc -l`. You can then call it via `git count`. – dotcs Jun 05 '14 at 12:41
  • 2
    For what it's worth, the `-l` is a lowercase L, not the number one. – Kevin Jurkowski Jul 09 '14 at 05:35
  • 56
    For including/excluding certain files use: ``git ls-files | grep -P ".*(hpp|cpp)" | xargs wc -l`` where the grep part is any perl regex you want! – Gabriel Nov 19 '14 at 14:41
  • 39
    If you were interested in just .java files you can use `git ls-files | grep "\.java$" | xargs wc -l` – dseibert Dec 09 '14 at 15:27
  • Counts "lines" in bin files (png/gif/etc)... :( – Budda Jun 19 '15 at 16:37
  • `'xargs' is not recognized as an internal or external command, operable program or batch file.` – CodyBugstein Oct 13 '15 at 06:19
  • 1
    @Imray That error is from a Windows command prompt, this question was tagged as `bash`, which is a *nix environment. Try using Cygwin, or check out cloc: http://sourceforge.net/projects/cloc/ – Bryan Way Oct 21 '15 at 18:19
  • 1
    Tried this command on Mac and got "xargs: wc: Argument list too long" error. Is it because the git repo is too big? – Shi Sep 09 '16 at 15:48
  • @shi, that could be, yes. Check the `xargs` man page to limit the number of arguments passed. – Carl Norum Sep 09 '16 at 23:12
  • 4
    The command is `ls-files | grep -e ".*py" | xargs wc -l` on Macs if you want to find the lines of code of python files. Don't use `-P`, for patterns it is `-e`. – Dhruv Ghulati Sep 12 '16 at 15:37
  • @CarlNorum in this calculation does it shows total count of lines of all the branches if so how do we get only the no of lines from a specific branch, say `master`. – Kasun Siyambalapitiya Dec 08 '16 at 09:15
  • 2
    `git ls-files | grep -vE "(png|jpg|ico)" | xargs wc -l` -- there's an example of _excluding_ various file types you don't want; we are counting lines after all. This was tested on mac and ubuntu. – Purplejacket Feb 24 '17 at 20:10
  • 1
    `git ls-files | sed 's/ /\\ /g' | grep -E "\.*(swift$|mm$)" | xargs wc -l` Using `sed` to escape files or paths that have spaces in them. – bleeckerj Jun 18 '17 at 21:17
  • doesn't work when there are single quotes in a file name – nurettin Dec 14 '17 at 10:46
  • I'm pretty sure that this is wrong - anybody who knows more about this can correct me, but surely this lists the file names in the the repository but actually counts the lines in the checked out version of those files. So if the files have changed size the total will be wrong. – Alan Feb 28 '18 at 08:42
  • 1
    I used `git ls-files | grep -v "json" | xargs wc -l` to ignore json files – Nicolai Weitkemper Jun 06 '19 at 20:27
  • This should give you the total for java files: `git ls-files | grep "\.java$" | xargs cat | wc -l` – Pedro Madrid Oct 11 '19 at 18:30
  • Surprised noone mentioned that skipping the cat makes it prone to exceeding the maximal number of command line parameters. For the cat case, it can simply execute cat again for the remainder. For wc -l, it will give erroneous output. – Johannes Schaub - litb Aug 01 '21 at 20:44
  • Doesn't work on Windows. I'm getting a `'xargs' is not recognized as an internal or external command, operable program or batch file.` on Windows. – KulaGGin Dec 23 '21 at 11:29
  • 1
    `git ls-files '**.h' '**.cpp' | ...`. Taken from https://stackoverflow.com/a/5681657/72178. – ks1322 Mar 15 '22 at 13:21
441

If you want this count because you want to get an idea of the project’s scope, you may prefer the output of CLOC (“Count Lines of Code”), which gives you a breakdown of significant and insignificant lines of code by language.

cloc $(git ls-files)

(This line is equivalent to git ls-files | xargs cloc. It uses sh’s $() command substitution feature.)

Sample output:

      20 text files.
      20 unique files.                              
       6 files ignored.

http://cloc.sourceforge.net v 1.62  T=0.22 s (62.5 files/s, 2771.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Javascript                       2             13            111            309
JSON                             3              0              0             58
HTML                             2              7             12             50
Handlebars                       2              0              0             37
CoffeeScript                     4              1              4             12
SASS                             1              1              1              5
-------------------------------------------------------------------------------
SUM:                            14             22            128            471
-------------------------------------------------------------------------------

You will have to install CLOC first. You can probably install cloc with your package manager – for example, brew install cloc with Homebrew.

cloc $(git ls-files) is often an improvement over cloc .. For example, the above sample output with git ls-files reports 471 lines of code. For the same project, cloc . reports a whopping 456,279 lines (and takes six minutes to run), because it searches the dependencies in the Git-ignored node_modules folder.

Rory O'Kane
  • 29,210
  • 11
  • 96
  • 131
  • 4
    CLOC ignores some languages, such as TypeScript. – Marcelo Camargo Oct 02 '15 at 14:31
  • 10
    @MarceloCamargo at this moment TypeScript is supported – Alex Jun 09 '16 at 09:39
  • 1
    For the beginner, better to execute "cloc DIRECTORY_WHERE_YOUR_GIT_IN" to calculate lines. – Shi Sep 09 '16 at 15:58
  • The full description is here : https://github.com/AlDanial/cloc and the binaries are here : https://github.com/AlDanial/cloc/releases/tag/v1.70 – Peter Szanto Nov 08 '16 at 10:00
  • @RoryO'Kane in here how do we know that what are the files that have been ignored in the process, will there be some code files falls to that – Kasun Siyambalapitiya Dec 08 '16 at 09:19
  • @KasunSiyambalapitiya You can find the answers to such questions in [CLOC’s documentation](https://github.com/AlDanial/cloc/blob/master/README.md). As CLOC’s README says, passing `--ignored=FILE` will “save names of ignored files and the reason they were ignored to FILE”. – Rory O'Kane Dec 30 '16 at 19:44
  • Just a side note, this doesn't count all the lines, it excludes empty lines and lines consisting of only comments. – Loovjo Jan 13 '17 at 18:03
  • 43
    You can just use `cloc --vcs git` these days, which avoids some edge cases with badly named files (or too many of them). – seanf Jan 24 '17 at 03:08
  • @Loovjo It's written right there if you read carefully, `blank`, `comment` and `code`. – Nearoo Feb 18 '17 at 17:43
  • 2
    does this leaks the code. i meant the github credentials and all – Madhu Nair Feb 18 '19 at 12:19
  • 4
    @MadhuNair Of course not. `cloc` counts lines of files in a local directory, without ever accessing the network. It doesn’t even know whether the code came from GitHub or not. – Rory O'Kane Feb 18 '19 at 22:06
  • Thanks, this is a helpful tool! Beware, though, that `cloc` does not exclude auto-generated files like JavaScript's `package-lock.json`. These will have to be subtracted if you want an estimate of how much work went into a piece of software. – Elias Strehle Apr 26 '22 at 08:27
418
git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

This shows the differences from the empty tree to your current working tree. Which happens to count all lines in your current working tree.

To get the numbers in your current working tree, do this:

git diff --shortstat `git hash-object -t tree /dev/null`

It will give you a string like 1770 files changed, 166776 insertions(+).

Borek Bernard
  • 50,745
  • 59
  • 165
  • 240
ephemient
  • 198,619
  • 38
  • 280
  • 391
  • 47
    BTW, you can get that hash by running `git hash-object -t tree /dev/null`. – ephemient Jan 27 '11 at 23:00
  • 87
    And even more succinct: `git diff --stat \`git hash-object -t tree /dev/null\`` – rpetrich Jul 08 '12 at 21:40
  • 11
    This is the better soloution since this does not count binary files like archives or images which are counted in the version above! – BrainStone Jul 20 '13 at 22:02
  • 33
    +1 I like this solution better as binaries don't get counted. Also we are really just interested in the last line of the git diff output: ``git diff --stat `git hash-object -t tree /dev/null` | tail -1`` – Gabriele Petronella Oct 16 '13 at 20:07
  • 2
    Is there any way of not counting lines just containing whitespace? – Cameron Martin Apr 22 '14 at 14:37
  • 5
    @CameronMartin `git diff -w` – ephemient Jul 17 '14 at 22:21
  • 36
    instead use `git diff --shortstat \`git hash-object -t tree /dev/null\` ` to get the last line, tail isnt needed. – Jim Wolff Oct 16 '14 at 11:38
  • 2
    @ChandlerLee It is the object ID of the empty tree, `git hash-object -t tree /dev/null`. Even if the empty tree never appears in a commit in your repository's history, Git is hard-coded to recognize it; look for `EMPTY_TREE_SHA1` in the source code. – ephemient Dec 31 '14 at 23:37
  • @ephemient : What does git diff -w do? I mean what is -w for ? – Zack Feb 03 '15 at 03:14
  • @ephemient I only found `EMPTY_TREE_SHA1_HEX` – johnchen902 Feb 19 '15 at 13:56
  • just to remember the hash ;-) use SHA1("tree 0\0") = 4b825dc642cb6eb9a060e54bf8d69288fbee4904 (\0 is NUL character) – Thomas Feb 03 '16 at 23:34
  • @Zack `-w` means `Ignore whitespace when comparing lines. This ignores differences even if one line has whitespace where the other line has none.` see the doc [https://git-scm.com/docs/git-diff] – Kasun Siyambalapitiya Dec 08 '16 at 09:33
  • @rpetrich `git diff --stat `git hash-object -t tree /dev/null` I can understand that ` ` is used to run git commands inside git commands, but can you guide me to a resource to learn about that kind of other commands, as I can't find any by searching – Kasun Siyambalapitiya Dec 08 '16 at 11:21
  • @ephemient in the above code does it count all the lines of code in all the branches that exists in the repo. If so what is the option to get only the lines of code in master branch – Kasun Siyambalapitiya Dec 08 '16 at 11:43
75

I've encountered batching problems with git ls-files | xargs wc -l when dealing with large numbers of files, where the line counts will get chunked out into multiple total lines.

Taking a tip from question Why does the wc utility generate multiple lines with "total"?, I've found the following command to bypass the issue:

wc -l $(git ls-files)

Or if you want to only examine some files, e.g. code:

wc -l $(git ls-files | grep '.*\.cs')

Community
  • 1
  • 1
Justin Aquadro
  • 2,280
  • 3
  • 21
  • 31
  • This is great but it seems to fail for paths which contain white spaces. Is there a way to solve that? – Lea Hayes Jun 08 '14 at 22:48
  • 1
    Had trouble with grep '.*\.m' picking up binary files like .mp3, .mp4. Had more success with using the find command to list code files `wc -l $(git ls-files | find *.m *.h)` – Tico Ballagas Oct 13 '14 at 21:04
  • 3
    @LeaHayes this is one way: `wc -l --files0-from=<(git ls-files -z)`. The `<(COMMAND)` syntax returns the name of a file whose contents are the result of `COMMAND`. – buck Nov 21 '14 at 02:59
  • @buck Thanks, but I am getting an error when I try that command 'cannot make pipe for process substitution: Function not implemented wc: unrecognized option --files0-from='. Any ideas? – Lea Hayes Nov 21 '14 at 14:02
  • @LeaHayes What OS / terminal are you using? More importantly, what version of `wc` are you using? GNU `wc` works for me. You could try downloading that to get this working. – buck Nov 21 '14 at 18:43
  • @buck the version which is included with the bash shell that is distributed with SourceTree for Windows. "wc (GNU textutils) 2.0". – Lea Hayes Nov 22 '14 at 16:38
  • 1
    @LeaHayes I came up with this script which I think would work for you: ``` #!/bin/bash results=$(git ls-files | xargs -d '\n' wc -l) let grand_total=0 for x in $(echo "$results" | egrep '[[:digit:]]+ total$'); do let grand_total+=$(echo "$x" | awk '{print $1}') done echo "${results}" echo "grand total: ${grand_total}" ``` – buck Nov 23 '14 at 00:54
  • 1
    the `-n` switch with `xargs` can be used to increase the maximum number of lines within a chunk – Anthony Dec 29 '14 at 12:48
73

The best solution, to me anyway, is buried in the comments of @ephemient's answer. I am just pulling it up here so that it doesn't go unnoticed. The credit for this should go to @FRoZeN (and @ephemient).

git diff --shortstat `git hash-object -t tree /dev/null`

returns the total of files and lines in the working directory of a repo, without any additional noise. As a bonus, only the source code is counted - binary files are excluded from the tally.

The command above works on Linux and OS X. The cross-platform version of it is

git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

That works on Windows, too.

For the record, the options for excluding blank lines,

  • -w/--ignore-all-space,
  • -b/--ignore-space-change,
  • --ignore-blank-lines,
  • --ignore-space-at-eol

don't have any effect when used with --shortstat. Blank lines are counted.

hashchange
  • 7,029
  • 1
  • 45
  • 41
29

This works as of cloc 1.68:

cloc --vcs=git

kes
  • 5,983
  • 8
  • 41
  • 69
  • 1
    `--vcs` didn't work for me, maybe it was removed. `cloc .` while at the git repo did work, OTOH. – acdcjunior Jul 10 '19 at 09:29
  • 1
    `--vcs=git` worked for me on version v1.90 =) But yes I ran it at the root, it's just an option to tell cloc what it can ignore – Henry Blyth Aug 10 '21 at 10:38
24

I use the following:

git grep ^ | wc -l

This searches all files versioned by git for the regex ^, which represents the beginning of a line, so this command gives the total number of lines!

Christopher Shroba
  • 7,006
  • 8
  • 40
  • 68
  • This is concise and doesn't require any new software, and gives a fast count of _textual_ lines (which is all the question really asks for). But it isn't a precise measure of executable code. It counts blank lines and comment lines, which are ignored by most of the purpose-built tools. (As an experiment I ran this on a small repo of utility code. `git grep` method: 5322; `sloccount`: 2942; `cloc`: 3251) – Paul Bissex Oct 12 '22 at 20:38
  • @PaulBissex very true! Total lines is often what I want, but I've seen others modify this to `git grep . | wc -l` to only match lines containing at least one character – Christopher Shroba May 02 '23 at 17:41
14

I was playing around with cmder (http://gooseberrycreative.com/cmder/) and I wanted to count the lines of html,css,java and javascript. While some of the answers above worked, or pattern in grep didn't - I found here (https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns) that I had to escape it

So this is what I use now:

git ls-files | grep "\(.html\|.css\|.js\|.java\)$" | xargs wc -l

Community
  • 1
  • 1
Michail Michailidis
  • 11,792
  • 6
  • 63
  • 106
  • 3
    This seemed to respond with chunks for me. Using your grep in combination with Justin Aquadro's solution resulted well for me. wc -l $(git ls-files | grep "\(.html\|.css\|.js\|.php\|.json\|.sh\)$") – PeterM Sep 16 '16 at 16:21
5

I did this:

git ls-files | xargs file | grep "ASCII" | cut -d : -f 1 | xargs wc -l

this works if you count all text files in the repository as the files of interest. If some are considered documentation, etc, an exclusion filter can be added.

Sasha Pachev
  • 5,162
  • 3
  • 20
  • 20
5

Try:

find . -type f -name '*.*' -exec wc -l {} + 

on the directory/directories in question

Milo
  • 3,365
  • 9
  • 30
  • 44
Theos
  • 51
  • 1
  • 1
5

If you want to get the number of lines from a certain author, try the following code:

git ls-files "*.java" | xargs -I{} git blame {} | grep ${your_name} | wc -l
Wang Zhong
  • 125
  • 2
  • 9
3

This tool on github https://github.com/flosse/sloc can give the output in more descriptive way. It will Create stats of your source code:

  • physical lines
  • lines of code (source)
  • lines with comments
  • single-line comments
  • lines with block comments
  • lines mixed up with source and comments
  • empty lines
love
  • 1,000
  • 2
  • 16
  • 35
3

Depending on whether or not you want to include binary files, there are two solutions.

  1. git grep --cached -al '' | xargs -P 4 cat | wc -l
  2. git grep --cached -Il '' | xargs -P 4 cat | wc -l

    "xargs -P 4" means it can read the files using four parallel processes. This can be really helpful if you are scanning very large repositories. Depending on capacity of the machine you may increase number of processes.

    -a, process binary files as text (Include Binary)
    -l '', show only filenames instead of matching lines (Scan only non empty files)
    -I, don't match patterns in binary files (Exclude Binary)
    --cached, search in index instead of in the work tree (Include uncommitted files)

bharath
  • 481
  • 4
  • 10
3

The answer by Carl Norum assumes there are no files with spaces, one of the characters of IFS with the others being tab and newline. The solution would be to terminate the line with a NULL byte.

 git ls-files -z | xargs -0 cat | wc -l
Thân LƯƠNG Đình
  • 3,082
  • 2
  • 11
  • 21
2
: | git mktree | git diff --shortstat --stdin

Or:

git ls-tree @ | sed '1i\\' | git mktree --batch | xargs | git diff-tree --shortstat --stdin
2

If you want to find the total number of non-empty lines, you could use AWK:

git ls-files | xargs cat | awk '/\S/{x++} END{print "Total number of non-empty lines:", x}'

This uses regex to count the lines containing a non-whitespace character.

Daniel Giger
  • 2,023
  • 21
  • 20
0

From a Windows11 terminal:

wsl.exe /bin/bash -c "git ls-files .| xargs wc -mwl"

Where the . is your git repository

Output:

Lines count | Word count | Character count

Hoopou
  • 13
  • 1
  • 6