52

(Edit: see Proper Usage section on the bottom.)

Main Question

How do you get cloc to use its --exclude-list-file=<file> option? Essentially, I'm trying to feed it a .clocignore file.

Expected Behavior

cloc documentation says the following:

--exclude-list-file=<file>  Ignore files and/or directories whose names
                          appear in <file>.  <file> should have one entry
                          per line.  Relative path names will be resolved
                          starting from the directory where cloc is
                          invoked.  See also --list-file.

Attempts

The following command works as expected:

cloc --exclude-dir=node_modules .

But this command doesn't exclude anything:

cloc --exclude-list-file=myignorefile .

This is the contents of myignorefile:

node_modules
node_modules/
node_modules/*
node_modules/**
./node_modules
./node_modules/
./node_modules/*
./node_modules/**
/full/path/to/current/directory/node_modules
/full/path/to/current/directory/node_modules/
/full/path/to/current/directory/node_modules/*
/full/path/to/current/directory/node_modules/**

cloc does not error if myignorefile doesn't exist, so I have no feedback on what it's doing.

(I'm running OS X and installed cloc v1.60 via Homebrew.)



Proper Usage

tl;dr -- The method specified in @Raman's answer both requires less to be specified in .clocignore and runs considerably faster.


Spurred on by @Raman's answer, I investigated the source code: cloc does in fact respect --exclude-list-file but processes it differently than --exclude-dir in two important ways.

Exact filename versus 'part of the path'

First, while --exclude-dir will ignore any files whose paths contain the specified strings, --exclude-list-file will only exclude the exact files or directories specified in .clocignore.

If you have a directory structure like this:

.clocignore
node_modules/foo/first.js
app/node_modules/bar/second.js

And the contents of .clocignore is just

node_modules

Then cloc --exclude-list-file=.clocignore . will successfully ignore first.js but count second.js. Whereas cloc --exclude-dir=node_modules . will ignore both.

To deal with this, .clocignore needs to contain this:

node_modules
app/node_modules

Performance

Second, the source code for cloc appears to add the directories specified in --exlude-dir to a list which is consulted before counting the files. Whereas the list of directories discovered by --exclude-list-file is consulted after counting the files.

Meaning, --exclude-list-file still processes the files, which can be slow, before ignoring their results in the final report. This is borne out by experiment: in an example codebase, it took half a second to run cloc with --exclude-dir, and 11 seconds to run with an equivalent --exclude-list-file.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Venning
  • 744
  • 1
  • 5
  • 12

4 Answers4

39

The best workaround I've found is to feed the contents of .clocignore directly to --exclude-dir. For example, if you are using bash and have tr available:

cloc --exclude-dir=$(tr '\n' ',' < .clocignore) .
Raman
  • 17,606
  • 5
  • 95
  • 112
  • `exclude-list-file` wasn't working for me in the same way as `exlude-dir`, but I didn't spend any time figuring out why. You could take a look at the source here: http://sourceforge.net/p/cloc/code/HEAD/tree/trunk/cloc – Raman Oct 31 '14 at 16:48
  • 1
    Thanks. I expanded on why it doesn't appear to work in the Question. Hopefully, it makes sense now. – Venning Oct 31 '14 at 18:04
  • Great, thanks for the investigation! It's too bad `--exclude-list-file` doesn't work as one would expect. – Raman Oct 31 '14 at 18:17
14

The accepted answer didn't work for me, since I wanted to specify sub-directories as well, which is only possible by using the --not-match-d="" regex argument. So I created a PHP file that generates the whole CLOC command using the .clocignore file (Example output)

$ php cloc.php

cloc --fullpath --not-match-d="(node_modules|App/ios|App/android)" --not-match-f="(yarn\.lock|package\.json|package\-lock\.json)" .

The script basically implodes the directory paths as a single regex string and outputs the full cloc command for copying convenience. I put it up on gist if anyone finds it useful :)

https://gist.github.com/Lukakva/a2ef7626724a809ff2859e7203accf53

Luka Kvavilashvili
  • 1,309
  • 10
  • 13
3

--not-match-d and --not-match-f may also meet your need.

   --not-match-d=REGEX
       Count all files except in directories matching the Perl regex.  Only the trailing directory name is compared, for example, when counting in
       "/usr/local/lib", only "lib" is compared to the regex.  Add --fullpath to compare parent directories to the regex.  Do not include file path
       separators at the beginning or end of the regex.

  --match-f=REGEX
       Only count files whose basenames match the Perl regex. For example this only counts files at start with Widget or widget:

           --match-f='^[Ww]idget'

       Add --fullpath to include parent directories in the regex instead of just the basename.

  --not-match-f=REGEX
       Count all files except those whose basenames match the Perl regex.  Add --fullpath to include parent directories in the regex instead of just the
       basename.
Yihe
  • 4,094
  • 2
  • 19
  • 21
1

This is how it works for my project enter image description here

I installed cloc and added script for it, like so:

   "cloc-src": "cloc --exclude-dir=node_modules,dist,mongo-data-4.4,yarn.lock,package.json,package-lock.json .",
   "cloc-dist": "cloc --match-d=/dist/ .",
Paul Alexeev
  • 172
  • 1
  • 11