8

I, and several other people, all have access to a repository that includes one file auto-generated by the IDE. This file is quite PC-specific, so shouldn't be in source control, but currently is. I want to delete it and add it to .gitignore, but I don't want it to be deleted for other collaborators when they pull my changes. There are plenty of questions about deleting a file but keeping my local copy; but they don't cover other users, so when they pull they'll still lose their copy despite me keeping mine:

Remove a file from a Git repository without deleting it from the local filesystem

How do I git rm a file without deleting it from disk?

There are also questions and solutions for not losing local files when pulling, so they can keep the files, but this requires explicit actions from those pulling, and I don't want to have to go and tell everyone exactly how to pull just this one time. I did find two questions which are duplicates. The answers there were that it can't be done, but they were both 5 years ago - has anything changed in the interim?

Git remove tracked files, but keep local AND remote

Git ignore file, without deleting it

This is important because the file is auto-generated when you first import the whole project and contains information on local compiler/library versions. So deleting it will require re-importing. If it makes any difference, it's .idea/scala_compiler.xml and .idea/scala_settings.xml (really the whole .idea directory should be ignored). Basically I want to have Git set a file as no longer tracked, but not delete it for anyone.

halfer
  • 19,824
  • 17
  • 99
  • 186
Y_Less
  • 559
  • 2
  • 13
  • 1
    Why not putting it in .gitignore? No one will need to delete it when pulling. – Tomer Shetah Aug 09 '20 at 17:09
  • I personally don't agree with ignoring a file and yet continuing to track the version that happened to be current at the time you decided to stop tracking it. Why preserve for all time some random version of a file that is auto-generated? If it's auto-generated, it should be fully deleted from git and added to .gitignore. As for causing people to have to re-import.... if they are developers they need to know how this stuff works. It's a good kind of inconvenience. Send a quick email to give them a heads up and go do the right thing. – JoelFan Aug 09 '20 at 18:12
  • > it should be fully deleted from git and added to .gitignore Yes, that's what I wanted, but maybe it wasn't very clear. I don't want to track the file, I want it as if the file was always ignored from the start. I just didn't want everyone to loose their locally untracked, and now unversioned, copies. – Y_Less Aug 10 '20 at 14:29
  • see this definitive question/answer - it can be done!: https://stackoverflow.com/questions/57418769/definitive-retroactive-gitignore-how-to-make-git-completely-retroactively-forg/ – goofology Aug 11 '20 at 19:36

1 Answers1

8

You can't.

Hm, let me try this again: you can't, but they can. Well, you can, but only for you, and they can, but only for them. You, or they, must run git rm --cached at just the right time. Of course, that's the solution you don't want to use.

To put it more usefully (at the risk of duplicating the earlier questions): the only thing you can do about these files, in terms of Git commits, is to omit them from future Git commits. By not being in commits, they will not be transferred by push and fetch operations either.

Remember, each commit holds a full and complete snapshot of all the files that Git knows about. (We'll refine this a bit further in a moment.) If Git knows about .idea/*, Git will put them in new commits, and when you push those commits—you can't push files, only commits—those commits, complete with those files, will go around. When you fetch new commits—again, you get entire commits, not files—those commits will come with those files.

The fundamental problem then becomes this:

  • You, or they, are on a commit in which Git knows about .idea/*. Your current commit has the files.
  • You, or they, have fetched some new commit(s). These new commits don't contain these .idea/* files.
  • If you (or they) now ask your (or their) Git to switch you from the current commit, to a commit that lacks the files, your (or their) Git sees that you (or they) are explicitly telling your (their) Git to remove the files. So it will do so.

The solution to this problem is:

  • You (they) must tell your (their) Git to forget these files now, so that the work-tree copies of these files are untracked:

     git rm -r --cached .idea      # note the --cached
    
  • Now you (they) tell your Git: switch to the new commit. The untracked files aren't in Git's view at all, and aren't in the new commit either, so Git won't remove the work-tree copies of these files.

Note that if you ever switch back to an old commit that does contain these files, your Git will overwrite your work-tree files with the committed files. (Their Git will do the same to their work-tree files under the same conditions.) So be very careful when returning to historic commits that contain these files. See the long explanation below for further details.

Long: what's going on here

As we just noted, each commit has a full and complete snapshot of every file. These snapshots are saved in a special, read-only, Git-only format. I like to call this format freeze-dried. The files in this form are automatically de-duplicated, so the fact that most commits mostly re-use most files from a previous commit means that the new commits take hardly any disk space.

It is safe for Git to re-use these freeze-dried files, because no part of any existing commit, including the saved files, can ever be altered. You can make new commits that are different from existing ones, but you cannot change the existing ones. Not even Git itself can do that.

Because you literally can't use these files to do any actual work, Git has to extract a commit. This is what git checkout (or, since Git 2.23, git switch) does: it extracts the freeze-dried files from some commit, into a form that you can actually use (and change). The commit you choose to extract, and then work with and/or on, is your current commit.

This means there are literally two copies of every file taken from the current commit: the freeze-dried one stored with the commit itself, and the regular-format, rehydrated one you're using to do real work.

To make a new commit, any version control system that uses this kind of scheme—and most do, though internal details vary a great deal—must take your current work-tree versions and turn them back into the appropriate committed versions. This can take quite a while, in large repositories. To make it easier for itself, Git doesn't actually do this at all.

Instead, Git keeps a third copy—well, not really a copy, exactly, because it uses the freeze-dried, de-duplicated format—in what Git calls its index, or staging area, or (rarely these days) cache. This cached, freeze-dried-format, pre-de-duplicated copy of the file is ready to go into the next commit you will make.

Let's repeat that in bold because it's the key here: Git's index contains the files that will go into the next commit, in the freeze-dried format, ready to go. A git checkout or git switch operation fills Git's index and your work-tree from a commit, which is now the current commit. All three copies now match, except that the work-tree copy is actually usable, instead of being freeze-dried.

If you change the work-tree copy, you must run git add on it. The git add command tells Git: Make your index copy match my work-tree copy. Git will now read the work-tree copy and compress and de-duplicate it into the freeze-dried format, ready to go into the next commit. So the files in the index no longer match the files in the current commit. In other words, a key difference between the index and the commit is that you can change the index contents, by replacing files wholesale like this.

These index copies are, literally, the files that Git knows about. They are the files that will be in the next commit. To make sure the next commit doesn't have some file, you simply remove it from Git's index.

The git rm command

The git rm command removes files from Git's index. Without --cached, it also removes these files from your work-tree. You want to keep your work-tree copy, so you need to tell Git: keep my work-tree copy by adding --cached to your git rm: remove only from the index ("cache").

Now that the file, or files, aren't in Git's index, they won't be in the next commit. So once you remove the files, you can make a new commit that doesn't have the files:

git rm -r --cached .idea && git commit

for instance.

Switching commits

When you use git checkout or git switch to switch from one commit to another—as by changing which branch you're on, for instance—you are telling Git: Remove everything related to the current commit and switch to the other commit. This has Git empty out its index, removing your work-tree copy of each corresponding file—the files that Git knows about. Then Git can re-fill its index and re-populate your work-tree with copies of the files from the commit you'd like to work on/with: your new current commit.

If Git knows about .idea/*, this is what makes the .idea/* files get removed. If they're not in the new commit, they don't come back from the new commit.

.gitignore has a trap for the unwary

The .gitignore file is somewhat misnamed. Files listed in .gitignore are not necessarily untracked, and if they are tracked—if Git knows about them because they are in Git's index—they're not ignored at all.

Let's note here that an untracked file is one that is in your work-tree right now but not in Git's index right now. That means that if .idea/* were tracked—came out of the current commit, for instance—but you just ran git rm --cached .idea/* or git rm -r --cached .idea, those work-tree copies are now untracked. It doesn't matter if they are in the current commit: what matters is whether they are in Git's index right now.

What .gitignore does is tell Git three things. The first two are usually the important two. The last one is the trap.

  1. If an untracked file's name, or pattern, appears in .gitignore, the git status command won't complain about the file being untracked.

  2. If an untracked file's name or pattern appears in .gitignore, git add won't add the file to Git's index (you can force git add to override this if you want). This means the file will remain untracked across normal everyday git adds.

  3. If an untracked file's name or pattern is listed in .gitignore, Git will sometimes feel free to clobber the file.

When you switch commits, Git tries not to clobber unsaved work

You may be familiar with this problem: you start working on some file—the copy in your work-tree, that is—and then realize: Whoops, I wanted to do this work on a different branch. You run git checkout branch or git switch branch, and Git says, in its somewhat cryptic way: I can't do that. Git tells you that you have unsaved changes that would be clobbered.

(Sometimes Git will let you switch branches anyway. This all has to do with Git's index, again. For the gory details, see Checkout another branch when there are uncommitted changes on the current branch)

If this unsaved work is in a tracked file, or is in an untracked file that's not listed in a .gitignore, this safety check will keep you from losing data. But listing a file in .gitignore will sometimes allow Git to overwrite or remove the work-tree copy. It's not obvious precisely when this happens—sometimes even with this in place, Git tells you to save your files first—but it is a problem.

The only complete solution is painful

Unfortunately, the only real solution to this problem is as painful as, or more painful than, the problem itself: you can take the repository that has commits that have the files, and use that to build a new, incompatible edited-history repository that contains only commits that never had the files at all.

To do this, use git filter-branch, or git filter-repo (relatively new and still not distributed with Git itself yet), or The BFG, or any such Git-commit-history-editing system. The way these all work, of necessity, is that they copy old commits—those that have the files—to new commits, with different hash IDs, in which those files never appear. This change then ripples "down through time" into all subsequent commits. That's what makes the new repository incompatible with the old one.

If you ever let the old repository and new one meet, and there's any related history that didn't change,1 the two Gits will join up the old and new histories and you'll essentially double the size of your repository while adding back all the commits you thought you had gotten rid of.


1This would be historical commits that predate the existence of the unwanted files. For instance, if you use GitHub's trick of starting with a README.md and LICENSE file, that commit would not require rewriting, and would remain unchanged and establish a common commit history between the old and new repositories.

Besides this, if you use an old Git that dates back to before the --allow-unrelated-histories flag, or supply --allow-unrelated-histories to git merge, that can also fuse the old history back into the new one.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Thank you for the very in-depth explanation, even if it was very in-depth as to why you can't do what I hoped you could. It did also highlight the problem that even if it is done the hard (manual) way it will still cause problems when they switch to other commits. I had hoped that maybe some "non-pure" method had been added to achieve this even if it didn't quite fit with the logical file/commit model. I do think it could still be special-cased somehow, even if it isn't yet. But thank you. – Y_Less Aug 10 '20 at 14:45
  • 1
    Unfortunately, even if Git acquired some ad-hoc method for dealing with this, it wouldn't fix any *old* Git versions. So I don't think the Git project folks see this as high priority. I'm not sure what would happen if you wrote your own method and submitted it (see the "submitting patches" documentation in the Git source mirror at github.com/git/git). – torek Aug 10 '20 at 14:55
  • thought I'd chime in here - files can be removed from git history (forgotten), AND ignored (but not deleted!) for local AND remote users! Of course any NEW pulls won't have the file, and git history needs to be rewritten, which is potentially a HUGE deal - but it CAN be done: https://stackoverflow.com/questions/57418769/definitive-retroactive-gitignore-how-to-make-git-completely-retroactively-forg/ – goofology Aug 11 '20 at 19:40
  • @goofology: this is all correct, but note that other users who still have the original clones will still have all the original commits. So those remote users have to *avoid* `git pull` lest they merge the original commits back into the rewritten history: they should just throw away their existing clones and make new ones. (In which case, any method that cleans out the old commits is fine.) – torek Aug 11 '20 at 19:43
  • I need to investigate this. I believe a 'fetch --all' and 'git reset FETCH_HEAD' will avoid merging original commits back in? I haven't tried git pull. Or maybe I did and it failed thus the aforementioned method. – goofology Aug 11 '20 at 20:02