1

I really have hard time to understand how .gitignore file works...

This is how my file looks like:

custom/history
cache
*.log
custom/modules/*/Ext
upload
sugar-cron*
custom/application/Ext
custom/Extenstion/modules/*/Ext/Language
!custom/modules/*/Language/cs_CZ.*
!custom/modules/*/Language/en_us.*
custom/Extenstion/application/Ext/Language
!custom/Extenstion/application/Ext/Language/cs_CZ.*
!custom/Extenstion/application/Ext/Language/en_US.*
.htaccess
config.php
config_override.php
files.md5

This is how my git status looked like:

apache@cb772759c68a sugarcrm$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#    LOG.txt
#    deploy_backup/
nothing added to commit but untracked files present (use "git add" to track)

So now I wanted to get rid of the two untracked files, but to my surprise a whole bunch of other files was removed too.

apache@cb772759c68a sugarcrm$ git clean -fd
Removing Disabled/upload:/
Removing LOG.txt
Removing custom/Extension/modules/Bugs/Ext/Language/
Removing custom/Extension/modules/Cases/Ext/Language/
Removing custom/Extension/modules/EmailAddresses/
Removing custom/Extension/modules/EmailParticipants/
Removing custom/Extension/modules/ForecastManagerWorksheets/
Removing custom/Extension/modules/ForecastWorksheets/
Removing custom/Extension/modules/Forecasts/
Removing custom/Extension/modules/Meetings/Ext/Layoutdefs/
Removing custom/Extension/modules/Meetings/Ext/WirelessLayoutdefs/
Removing custom/Extension/modules/Meetings/Ext/clients/
Removing custom/Extension/modules/ModuleBuilder/
Removing custom/Extension/modules/OutboundEmail/
Removing custom/Extension/modules/PdfManager/
Removing custom/Extension/modules/ProjectTask/Ext/Language/
Removing custom/Extension/modules/Quotas/
Removing custom/Extension/modules/Quotes/Ext/Dependencies/
Removing custom/Extension/modules/Targets/
Removing custom/Extension/modules/Tasks/Ext/Language/
Removing custom/Extension/modules/TimePeriods/
Removing custom/application/
Removing custom/install/
Removing custom/modules/Administration/
Removing custom/modules/Bugs/
Removing custom/modules/Cases/
Removing custom/modules/Contracts/
Removing custom/modules/Emails/
Removing custom/modules/HHP_Products/
Removing custom/modules/KBContents/
Removing custom/modules/Project/
Removing custom/modules/ProjectTask/
Removing custom/modules/ProspectLists/
Removing custom/modules/Prospects/
Removing custom/modules/Quotas/
Removing custom/modules/Reports/
Removing custom/modules/RevenueLineItems/
Removing custom/modules/Schedulers/
Removing custom/modules/Tags/
Removing custom/modules/Teams/
Removing custom/modules/hhp_assignment_zip/
Removing custom/modules/hhp_zipcode/
Removing custom/working/modules/Calls/
Removing custom/working/modules/Leads/clients/
Removing deploy_backup/
Removing deploy_log/
Removing dist/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/
Removing vendor/sugarcrm/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/

First point - The removed files were not shown after git status so obviously they were part of gitignore "mask"... Can anyone explain, how does any of these files match any of the patterns in gitignore? Like vendor/sugarcrm/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/ ... Can anyone help me with building a propper gitignore?

Second point - I thought that .gitignore "protects" these unversioned files from git clean, that git literally does not take any action up on them. So obviously it does delete them... how can I not delete unversioned files while using git clean ?

EDIT: I confused git clean with git rm, I was talking about git clean the whole time

EDIT 2: it turned out, that the deleted directories which didn't match the .gitignore were "empty" after all. (they had subdirectories, but the directory tree was without any files...)

Charlestone
  • 1,248
  • 1
  • 13
  • 27
  • 2
    Were the removed directories empty? – melpomene Jan 28 '18 at 11:05
  • no, they were not – Charlestone Jan 28 '18 at 11:16
  • These two answers may be helpful: [How can I see list of ignored files in git?](https://stackoverflow.com/questions/20640678/how-can-i-see-list-of-ignored-files-in-git/48121129#48121129) and [How do I discard unstaged changes in Git? (comment)](https://stackoverflow.com/questions/52704/how-do-i-discard-unstaged-changes-in-git/12184274#comment83524035_36924148) – ErikMD Jan 28 '18 at 15:11
  • I always always always (even when I think I know everything I need to know) first use `--dry-run` before doing the 'real thing.' Just run `git clean -ffdx --dry-run`, read the output, then run it without `--dry-run`. My future selves have never fired the previous self that used `--dry-run`.... – Kay V Mar 10 '22 at 15:51

2 Answers2

4

TL;DR

You've mis-interpreted what git clean removes by default and with -d. (Note: I'm not a big fan of git clean myself; it's way too easy to have it remove precious files.)

Long

As phd notes, listing a file in .gitignore specifically disables, by default, having git clean clean it away. However, git clean is (significantly) more complicated than that. We'll get into this in a bit.

First, though, let's address one peculiarity of .gitignore entries. If you already know all this (but nobody seems to :-) ) you can skip down to the git clean-specific sections below.

  1. A file that is tracked (is in the index right now) is never ignored, so that matching a .gitignore or equivalent (e.g., .git/info/exclude) pattern is irrelevant.

    The phrase is in the index right now means just that. When you use git add or git rm --cached to add or remove a file, that changes its tracked-ness. You can also use git ls-files --stage to dump out a complete list of every file in the index along with its staging data—mode, hash, and stage-slot-number—or without --stage to get just the names.

  2. A file (not a directory) that has been found by Git, that is not in the index right now, is untracked. Git does not store directories so directories never appear in the index.1 Tracked or untracked is purely a property of files.

  3. An untracked file can also be an ignored file. If so, git add won't add it, even if you name it explicitly on the command line (though you can both name it explicitly and use --force to add it).

    This means files (but not directories) fall into one of three categories: tracked, untracked (only), or untracked-and-ignored. This matters for git status, which only complains about untracked files (not untracked-and-ignored), but also in a moment for git clean as well.

  4. Last, when Git is doing a full directory-tree search / scan—as in git add . for instance—and encounters a directory that it might be able to skip (has no tracked files within it), Git will check whether the directory itself matches a .gitignore pattern, and if so, not look inside it. This speeds up git status and git add -A / git add . on such directories (sometimes enormously, if you can ignore an entire vendor tree or SDK for instance).

Rule 4 is why, if you want to not ignore particular file paths that live underneath some directory path, you must instruct Git to specifically not-ignore the directory. If you ignore the directory, Git may never look inside the directory. This affects these three lines in particular:

custom/Extenstion/application/Ext/Language
!custom/Extenstion/application/Ext/Language/cs_CZ.*
!custom/Extenstion/application/Ext/Language/en_US.*

If you have ignored the entire directory custom/Extenstion/application/Ext/Language, Git won't look inside it and will never find any file matching custom/Extenstion/application/Ext/Language/cs_CZ.* to un-ignore it. It's therefore necessary to except the directory itself from ignored status: you should change the first line to read custom/Extenstion/application/Ext/Language/*, so that Git must look inside the directory. The subsequent lines ending with cs_CZ.* and en_US.* will override the ignored status for Czech and US-English files.


1In fact, they can appear in the index, but only so as to be treated as special cases. git ls-files, which can show you the index contents, skips right over them.


Using git clean -d clearly modifies Rule 4

Git can only remove a directory if it's empty. This is a general OS-enforced rule: if a directory d contains some files d/f1, d/f2, and so on, and you were to remove d without removing the files first, you'd have a problem with the files. The system forces you to first remove the files within the directory. This applies to sub-directories as well: you can't remove d if d/sub exists even if d/sub is itself an empty directory. Only empty directories can be removed.

Running git clean without -d not only leaves Rule 4 installed, but actually extends it. For instance, in the example we started with, Git notices that (1) custom/Extenstion/application/Ext/Language is a directory; (2) the directory matches an ignore pattern; so (3) provided there are no files in custom/Extenstion/application/Ext/Language that are already tracked, Git can and will skip the entire directory (and of course not remove it, since git clean is running without -d).

Suppose that there's another directory named xyzzy/ that has no files listed in the index. This directory might be completely empty. In that case, there are no untracked files within it, by definition; so git clean without -d should do nothing to it. Or it might have files; these files are by definition untracked (and hence may be untracked-and-ignored), but you said not to remove directories, so git clean still doesn't even bother to look inside. This is the slightly odd case: Git often doesn't bother to look inside unknown directories.2 (You see this with git status as well: you have to use git status -uall to find the files inside a mystery directory. But git add -A or git add . has to look inside, unless the directory is ignored, which is why Rule 4 is a bit complicated in the general case.)

Running with -d, though, apparently throws Rule 4 out completely. Again, in order to remove a directory, Git must first remove all the files within the directory. To do that, Git has to enumerate the contents as well. So if you tell git clean to use -d, it seems appropriate to disable Rule 4 entirely. The directory-ness of a path name will force Git to scan the directory's contents. Either we already needed to look inside because there are tracked files, or we need to look inside to remove files in order to remove the directory.


2Note that "unknown" is not the same as "untracked". It's not even a Git term; I've made it up here. However, as we'll see, it might be nice if Git did define the phrase "untracked directory".


What git clean removes

Running git clean -n will show you what it would remove. This showing uses some shorthand: removing a directory implies removing all the files within that directory, including (recursively) sub-directories with sub-files. This is safer than running with -f instead of -n, since -f shows you what it did remove, the same way -n shows you what it would remove.

By default, git clean removes files that are untracked, but not files that are untracked-and-ignored. That is, go back to point 3 above and look at the three classifications of files: git clean removes the middle classification (only). Adding -X (uppercase X) tells Git: don't remove untracked-only files; instead, remove untracked-and-ignored files.

Adding -x tells Git: don't read the usual ignore-directives files such as .gitignore. At this point, no files will be ignored, so that (regardless of which files are tracked) no files can be untracked-and-ignored. Combining this with -X would make no sense,3 so git clean forbids you to use both -x and -X.

Running git clean with -d adds empty-directory removal. Here, things get particularly squirrely, though. It seems as though Git's tracked, untracked, and untracked-and-ignored classification breaks down a bit. The documentation says that -d will:

Remove untracked directories in addition to untracked files.

But Git has no definition of untracked directories. "Tracked-ness" is exclusively a property of files. We did see, in a footnote, that directories sneak into the index as invisible entities (for purposes of speeding up various Git operations), but that doesn't really mean that directories are tracked.

We can make one up: an "untracked directory" might be a directory that contains no tracked files. I think (but have not proven to my own satisfaction) that this definition works and explains git clean's behavior. It would help a lot if the Git documentation actually defined this properly, though.


3Combining -x and -X with -e could have some practical uses, but Git still forbids this, at least as of today.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Hi, thanks a lot for your answer (or should I call it a wiki? :D) I was now able to set the .gitignore file correctly and everything works just fine! – Charlestone Jan 30 '18 at 09:10
1
  1. .gitignore ignores file from being added and committed. It doesn't protect them from being cleaned, exactly opposite.

  2. Those cleaned files are related to .gitignore the following way:

    custom/Extension/modules/Bugs/Ext/Language/ custom/Extension/modules/Cases/Ext/Language/

match custom/modules/*/Ext rule.

LOG.txt
vendor/sugarcrm/identity-provider/tests/docker/saml-test/config/simplesamlphp/config/

Files were not added to the index so they are eligible for cleaning.

  1. To avoid cleaning unversioned files don't run git clean. Remove unnecessary files manually.
phd
  • 82,685
  • 13
  • 120
  • 165
  • well, the files you mentioned are kind of an obvious match, but whole other bunch of them is not. How does e.g. `custom/install/` match anything? – Charlestone Jan 28 '18 at 18:35
  • It seems you missed point 1. `git clean` removes everything untracked. It doesn't matter if files are listed in `.gitignore` or not — `git clean` removes them if they are not in the index. – phd Jan 28 '18 at 18:47
  • but if they were not in the index, why didn't the file show up after git status command as untracked? – Charlestone Jan 30 '18 at 09:13
  • By definition: untracked files are the files that aren't under git control, i.e. those files that hasn't been added to index. But `git status` doesn't show ignored files. If you want to see all untracked files including ignored use `git status -u` or `git status --ignored`. – phd Jan 30 '18 at 14:35