4

I've already read about related SO threads here and here, as well as Github Linguist manual override, but I cannot seem to be able to exclude some top-level directories from language statistics.

At its current latest version, this repo shows a predominance of HTML code. Clicking on the HTML details, two HTML files are listed:

  • packages/NUnit.2.5.7.10213/NUnitFitTests.html
    Last indexed on 30 Dec 2016.

  • packages/NUnit.2.5.7.10213/Tools/NUnitFitTests.html
    Last indexed on 30 Dec 2016.

but those should be part of excluded paths within .gitattributes:

.nuget/* linguist-vendored
libs/* linguist-vendored
NUnit.Runners.2.6.4/* linguist-vendored
packages/* linguist-vendored             §§ <--- this one in particular
RubyInstallationFiles/* linguist-vendored

But in the same details page, the ranking at the bottom left clearly shows HTML at a lower place, while C# sits at the top:

enter image description here

What am I doing wrong?

Side question: among the many changes, I also removed comments from .gitattribute file, as I could not find from any reference if those are allowed or what. Does anyone know if you can have comments in there? Which format? TA

Community
  • 1
  • 1
superjos
  • 12,189
  • 6
  • 89
  • 134

2 Answers2

9

You can check the attributes with git-check-attr and verify they're set the way you think they are.

$ git check-attr --all -- packages/NUnit.2.5.7.10213/NUnitFitTests.html
$

Seems it has no attributes. The problem appears to be that packages/* is not recursive.

$ git check-attr --all -- packages/NUnit.2.5.7.10213/
packages/NUnit.2.5.7.10213/: linguist-vendored: set

So what are the rules for patterns? Same as for gitignore.

The rules how the pattern matches paths are the same as in .gitignore files; see gitignore(5). Unlike .gitignore, negative patterns are forbidden.

What you're looking for is /**.

A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

Putting that fix in...

$ cat .gitattributes 
.nuget/** linguist-vendored
libs/** linguist-vendored
NUnit.Runners.2.6.4/** linguist-vendored
packages/** linguist-vendored
RubyInstallationFiles/** linguist-vendored

And now we're good.

$ git check-attr --all packages/NUnit.2.5.7.10213/NUnitFitTests.html
packages/NUnit.2.5.7.10213/NUnitFitTests.html: linguist-vendored: set

That also answers your question about comments...

A line starting with # serves as a comment. Put a backslash ("\") in front of the first hash for patterns that begin with a hash.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 2
    I see. Wasn't aware that it shared same glob syntax as `.gitignore`. I found it quite confusing now, that in all examples the docs only showed a single `*`. Also, I didn't know about `git check-attr` command, thanks. – superjos Mar 02 '17 at 16:36
  • Double asterisks aren't actually needed at the end of paths for Linguist. – pchaigno Aug 12 '17 at 21:03
  • @pchaigno It's not Linguist that makes that decision, it's Git. Git applies the attributes. Linguist asks Git what attributes a file has. The attributes aren't applied without the double asterisks. [The Linguist docs say](https://github.com/github/linguist#using-gitattributes) "*Add a .gitattributes file to your project and use standard git-style path matchers for the files you want to override to set linguist-documentation, linguist-language, linguist-vendored, and linguist-generated.*" – Schwern Aug 13 '17 at 23:03
  • I have the same interpretation as you of the documentation, but after testing with both `.gitignore` and Linguist-specific attributes, it looks like a single `*` at the end of a path is interpreted exactly the same way as a `**`. Am I missing something? Is the documentation incorrect (in particular, "*note the /\* - without the slash, the wildcard would also exclude everything within foo/bar*")? – pchaigno Aug 14 '17 at 09:45
  • @pchaigno I tried `packages/* linguist-vendored` and `packages/ linguist-vendored` in `.gitattributes` and `git check-attr --all -- packages/NUnit.2.5.7.10213/NUnitFitTests.html` did not return anything with `git` 2.13.1. I didn't run Linguist. Note that `.gitignore` is already ignoring `**/packages/*`. – Schwern Aug 14 '17 at 15:12
  • I can confirm the `git check-attr` output. It doesn't seem to recognize `*` for me neither. However, both Linguist and .gitignore recognize it at the end of paths. For instance, if I have `test1/*` in my `.gitignore`, the file `test1/test2/test.html` is ignored. – pchaigno Aug 14 '17 at 15:17
  • 1
    Chiming in 1000 years later, `directory/* linguist-vendored` didnt work for me, but `directory/** linguist-vendored` did. – Seth Lutske Apr 07 '21 at 22:25
1

Several things can be happening:

Language statistics weren't updated yet The language detection job runs as a low-priority background job. Language statistics may take some time to update (up to a day).

You've missed some HTML file(s) Search results showing files for each language are cached and not always up-to-date. Therefore, there may be some HTML files in your repository that you forgot to vendor.


How to debug? Your best option is to run Linguist locally. If you have a working Ruby environment, this is as simple as:

gem install github-linguist
linguist /path/to/your/repository --breakdown

This command will output Linguist results with the files detected for each language and the computed percentages.


Note: Your .gitattributes syntax is correct, no need to double the asterisks. Double asterisks are not needed at the end of a path for Linguist. However, you may need them to match several directories at the beginning of a wildcarded path, e.g.:

**/NSpec/Domain/Formatters/Templates/*
pchaigno
  • 11,313
  • 2
  • 29
  • 54