git ignore files pushed in the remote by another person, without modifying the remote

Question

I did git pull and saw that my colleague has pushed several files which I want to ignore in my clone. I want to untrack those files and ignore them in .git/info/exclude, but without modyfying the remote repository (I have contributing rights to the repository, but I don't want to have an argument with my colleague about it). How can it be done?

When I do git rm --cached, the removal gets committed.

score 1 · Answer 1 · answered May 16 '20 at 20:42

1

If you're using a sufficiently recent version of Git (2.25), you can try using the sparse checkout feature to only put certain directories or files in your working copy.

Alternatively, it might be possible to mark the files as "ignored" (not in the .gitignore sense) by running

git update-index --skip-worktree <filename>...

and then deleting them with rm (Linux/OS X/etc) or del (Windows, I think). However, Git will complain any time you try to check out a revision in which any of those files are modified, so this is probably too annoying to be practical.

The normal .gitignore mechanism does not do anything for files that have been committed or added locally.

answered May 16 '20 at 20:42

David Z

128,184
27
255
279

1

So there is no practical solution to this? You said `Git will complain any time you try to check out a revision in which any of those files are modified`. Can this annoyance be mitigated temporarily by may by writing a script which reads a file which contains list of files or folders on which we did --skip-worktree (Which could undo this effect temporarily before doing checkout.) – Porcupine May 16 '20 at 21:15
You can find the list of files on which the skip-worktree flag is set by using `git ls-files` with appropriate options. I'm sure you could work that into a script if you want. – David Z May 16 '20 at 21:36
The [Git documentation](https://git-scm.com/docs/git-update-index#_notes) says not to use `--assume-unchanged` or `--skip-worktree` for this purpose. There is no way in Git to ignore changes to tracked files. – bk2204 May 17 '20 at 11:59

score 1 · Answer 2 · answered May 17 '20 at 00:33

The TL;DR is that you cannot get what you want. Of course, this assumes I know what it is that you want. A comment you made suggests that maybe I'm wrong, and maybe you can get what you want—but you will need to write at least a little bit of code. You'll need to learn a lot of Git-inner-workings detail to understand exactly what this code will do for you.

I think what you might really want to do is to use the sparse checkout code to just not check out these files at all. Git's sparse checkout code is ... not ready for normal use, though. Git 2.25 has new features that are aimed at a different use case. Making yours work here will be harder.

Note: if you already know lots about Git, scroll down to the last section. Refer to earlier sections if/when needed only.

Long

I want to untrack [particular] files

You can do this. To do this, you must remove them from Git's index (git rm --cached), whether or not you also remove them from your work-tree.

... and ignore them in .git/info/exclude,

You can do this any time, but of course, if they're tracked—if these files are in Git's index—this has no effect.

... but without modifying the remote repository

It's important, here, to distinguish between each repository, and any given checkout of any given commit found in any given repository.

You, yourself, literally cannot modify someone else's work-tree, and in general, if you do modify someone else's repository, you do it by adding new commits to their repository, which has no effect on any existing commit. But that's not quite the final effect you want.

(I have contributing rights to the repository, but I don't want to have an argument with my colleague about it).

At some point, you might still have to, but perhaps you can defer it for a long time.

There are several important things to know here:

Git is really all about commits. It's not about files and not about branches. Git has files, because commits have files. Git uses branch names like master, because it needs to have the names in order to find the right commits. But it's really about the commits.
Each commit has two main parts: data, and metadata.
The data part of a commit is a complete snapshot of some set of files. These files are stored in the commit in a special, frozen / read-only, Git-only format, in which the individual files are de-duplicated.¹ This means you literally cannot use these files to do any work. Git must extract them into a separate work-area—which isn't part of the repository. We'll have a lot more about this soon.
The metadata part is mostly the other stuff that you see in git log output: things like who made the commit, when, and why—the log message.
The real name of any given commit is its hash ID. These hash IDs are big, ugly, random-looking, and too difficult for humans to deal with—but they are the keys that Git uses to find the commit objects, which Git stores in a big key-value database of all of its objects. This object database makes up most of what a repository is. The keys are hash IDs, and the values are commit objects and other supporting Git objects. When you clone a Git repository, this object database is what you're copying. You get most² of the objects inside that database.
Branch names and other names—there are many sub-classifications of names, including tag names, remote-tracking names, and temporary names used during operations like git bisect—form another database: the keys are the names, fully spelled out like refs/heads/master, and the values are hash IDs, which Git will use as keys in the big database. Cloning a repository can copy this database completely, but normally doesn't: your Git takes their branch names and transforms them. Your Git takes some or all of their tag names and keeps them. Your Git throws away all the other name-value pairs.

What this means is that Gits share their distributed objects databases (across the space of all clones) but have semi-private name-to-hash-ID databases. The object database in a repository is fundamentally append-only: no object, once inserted, can ever be modified.

A given repository can throw away some of its own key-value pairs—which in the end, at the level we care about, turns into throw away some commits—but you can't make someone else's repository do that directly, and in most normal operation, you will not do that at all. The one exception here occurs with git push --force, which you should only do with names that you and everyone else agree can be handled this way.

¹Git does this by storing each file's data as a blob object in its big database. The content gets a checksum applied—currently SHA-1—and Git uses the checksum itself as the key to look up the object that contains the data. So every file needs to have a unique hash. Lucky for Git, the hash isn't quite the same as just doing an SHA-1 on the file. See also How does the newly found SHA-1 collision affect Git?

The files' names, modes, and blob hash IDs are stored in tree objects, and each commit object refers to exactly one tree object. Two commits that store the exact same snapshot simply share the tree object, while two commits that store all files but one as exactly-the-same will have different trees, but will share all the file objects except for that one differing file.

Below this level, Git adds a packed object format in which individual objects can be delta-compressed against sufficiently-similar objects. The result is that a .git directory can be smaller than the files you extract from it! Usually this isn't true once a project has been around for a long time, but Git's storage model tends to be quite efficient.

²The most part here is tricky and not really relevant anyway, so we won't cover it properly here. It has to do with reachability: a new clone should only obtain any reachable objects in the big database.

Commits are in backward-looking chains

The next thing to remember is that in a Git repository, the commits themselves have a critical property. They are linked together by parent hash IDs, which causes the entire set of commits to represent a Directed Acyclic Graph or DAG.

More specifically, one element in each commit's metadata is a list of parent commit hash IDs. This list usually has just one item in it—one single parent—in which case the commit is an ordinary commit. For a merge commit, the list normally has two parent hash IDs. The first one is the usual parent, and the second one is the commit you specified when you ran git merge to make that merge commit.

We say that a commit points to its parents. At least one commit in any non-empty repository has to have no parent: the first commit you make, in an empty repository, has no earlier commit to point to, so it just doesn't. Other commits point to their parent(s) as usual. The parent hash IDs have to be IDs of valid, existing commits, so these links or pointers always point backwards. A commit cannot point to itself, nor to a commit that does not exist yet that might point back to itself. That means that following these backwards links always takes us back in time, and we never return to the commit we started from. The act of following commits like this—of computing a transitive closure of all parents—produces the DAG.

Given a normal (non-merge) commit, Git will show us that commit by comparing its snapshot to that of its parent. Most of the files will probably match entirely. A few won't: Git will tell us about those files, and not tell us anything about the ones that match. So we can view commits as changes, even though they're snapshots, because they exist in these backwards-pointing chains.

Git makes new commits from its index, not from your work-tree

Before you can use a commit, you have to extract it. The act of checking out some commit, using git checkout or (since Git 2.23) git switch, consists of selecting some particular commit—we'll get to branch names in a moment; for now let's concentrate on the commit part—and copying its files from their special frozen de-duplicated Git form, to normal everyday form.

This is what your working tree or work-tree is all about. Git copies the committed files into an area for your use. This area belongs to you and is not actually part of the repository at all! This action is not controversial, shocking, or confusing: it's obvious why Git has a special format for committed files, and that this format is useless for getting new work done. But once you realize that the files you work on / with aren't actually in Git, this opens up a lot of possibilities.

The other special thing you need to know, though, is that Git doesn't just extract straight from a commit to your work-tree. Instead, it first copies the committed files to Git's index.³ The file here is in the special frozen format—but unlike the committed copy, it's not frozen. Then, after there's a good copy of any given file in the index, git checkout will extract that file to your work-tree.

What this means is that from this moment onward, until you or Git change it somehow, the index itself has a copy of every file from the current commit. To make a new commit, you'll modify the work-tree file and then run git add path/to/file. This git add step copies the work-tree file back into the index, turning the file back into the frozen format, ready to go into a new commit. It's not in a commit yet—it's just in Git's index, ready to be committed.⁴

When you run git commit, it's then that Git packages up all the files in the index into a new commit. Git collects the appropriate metadata, saves the files as the data, and writes out the new commit, which gets a new and unique hash ID.⁵ The parent of the new commit is the hash ID of the commit you checked out earlier. The files in the new snapshot are those that were in the index, which came from the earlier commit, except for any that you replaced with git add, or removed entirely with git rm.

³Technically, the index doesn't actually hold a literal copy of the file. Instead, it holds a long list of <name, mode, blob-hash> entries, which amounts to a flattened version of the tree objects that Git stores internally. But since the underlying blob objects can't be changed—even though the index copies can be changed—and Git handles this smoothly and invisibly on its own, you can just think of the index as if it held actual copies of the files. It's only when you start using git ls-files --stage and git update-index to directly address index entries that this part starts to matter.

⁴As a result, git adding a file whose content has never been seen before creates a new internal blob object. Git will be sure to keep that blob object around until you commit it—after which it's safe forever—or eject it from the index in some way, releasing it to the garbage collector.

There was a bug in git worktree add, starting with Git 2.5 and finally fixed in Git 2.15, where added worktrees' index files weren't scanned. The result was that 14 days after you git added some file to a secondary work-tree, if you hadn't committed yet, a git gc could discard the object from the repository database. The same thing happened with detached HEADs in added work-trees: they were not scanned so their commits were unprotected and could be GC'd. This is a particularly nasty bug as it loses committed files. I ran into this bug myself, but lucky for me I didn't actually want those files—they were just an experiment that I hadn't discarded properly yet.

⁵To help make sure that every commit has a new and different hash ID, Git includes the parent commit's hash ID, the source snapshot tree hash ID, and the date-to the second—at which you make the commit, in the metadata. So even if you make two separate commits that have the same snapshot and same parent, they have different timestamps, and hence are different commits.

The only way to defeat that is to make both commits at the same time. This is actually possible, provided you write a program to do it—make the computer make a commit; humans are far too slow—but if you do this, you presumably know what you're doing, and aren't shocked by the result. I did it myself, and was surprised until I thought about it, then realized: yeah, that's what should have happened.

Git generally finds commits by branch names

This part isn't entirely relevant to your issue, but since we have come this far, let's cover it.

When you use a branch name—or indeed, any name such as a tag or remote-tracking name—and give that name to git checkout or git switch, you're instructing Git to select that commit and extract it, both to Git's index and your work-tree. That commit becomes your current commit. But there is a special case here: when the name you give to git switch or git checkout is a branch name, Git doesn't just select that commit, it also selects that name.

All names—branch names or not—just store one Git object hash ID. When the name is a branch name, the hash ID it stores must be that of a commit object.⁶ So if you give git checkout a branch name, that means that one specific commit—but Git also saves the name.

The way this works internally is that Git has a very special name, HEAD, that doesn't live in the refs/heads/ or refs/tags/ or any other refs/* name-space. (Tags are in refs/tags/, for instance.) This name is implemented by a file, usually .git/HEAD,⁷ that contains a string. The string is either a raw commit hash ID—which Git calls a detached HEAD—or it has the form ref: refs/heads/branch, where branch is your current branch name.

You can ask Git two different questions:

What branch name is stored in the special name HEAD?
```
git symbolic-ref HEAD
git symbolic-ref --short HEAD
git rev-parse --symbolic-full-name HEAD
git rev-parse --abbrev-ref HEAD
```
All three of these commands produce similar answers: they tell you the branch name. The rev-parse variants don't fail if you're in detached HEAD mode, but don't print anything particularly interesting either (try it out to see).
What is the hash ID of the current commit?
```
git rev-parse HEAD
```
This almost never fails,⁸ but only tells you what the hash ID is. If you wanted the name, you need to ask the other question.

In any case, having selected a commit by branch name, git checkout or git switch will record the name in the special HEAD file. If you select a commit some other way—by raw hash ID, or tag name, or remote-tracking name, for instance—Git will put you in detached HEAD mode.

Whenever you make a new commit with git commit, Git:

Sets the new commit's parent based on resolving HEAD to a hash ID. If you are on an unborn branch (see footnote 8), you get a new root commit—one with no parent. If you are completing a merge, Git adds the other commit as the second parent.⁹
Uses its index to build the snapshot.
Supplies the rest of the metadata as usual.
Actually creates the commit, obtaining a new hash ID.
Writes the new commit's hash ID somewhere.

That last step—step 5—writes the new hash ID to the current branch name, if you're not in detached-HEAD mode. If you are in detached HEAD mode, it writes the hash ID directly to HEAD itself.

In the normal case—when Git writes to a branch name in step 5—this extends the branch:

...--G--H   <-- dev (HEAD), master

becomes:

...--G--H   <-- master
         \
          I   <-- dev (HEAD)

after which new commits on dev continue extending the branch:

...--G--H   <-- master
         \
          I--J   <-- dev (HEAD)

and so on.

⁶Tag names get more flexibility because tag names often point to annotated tag objects, which can carry stuff like a PGP signature. The tag object then normally points to a commit. If it points to another tag object, that tag object normally points to a commit, and so on. Remote-tracking names like origin/master are copied from branch names, so they must point to a commit.

⁷In work-trees resulting from git worktree add, the HEAD for the added work-tree is in a different place. There's also a separate index file for each added work-tree. So it's best not to assume too much about .git/HEAD—but knowing that it exists, and peeking at it, is a good way to understand how Git actually works.

⁸It fails when you are on a branch that does not yet exist. This is the case in a new, totally-empty repository: you're on branch master, but there is no master. A branch name—like refs/heads/master—must point to a valid, existing commit. There are no commits. So master is not allowed to exist. Yet you're on master: .git/HEAD contains ref: refs/heads/master.

Whenever you are in this state, git rev-parse HEAD fails. The symbolic lookups succeed. That's how you know you are on an unborn branch.

⁹If you're making an octopus merge—which is one with 3 or more parents—you should not be running git commit to do it, as octopus merges don't stop with conflicts. You can build your own manual octopus merge with git commit-tree too, but again that's not git commit.

Sparse checkout and `--skip-worktree`

Now that you know Git makes new commits from whatever is in Git's index, rather than what is in your work-tree, you're ready to understand the --skip-woktree flag.

Each index entry—each "file" that Git has, in the area in which it stores all the files that are ready to go into the next commit you'll make—actually has a path name—complete with slashes, e.g., path/to/file.ext—and a mode and an internal blob hash ID. Git will use all of this stuff to build the snapshot for the next commit. You don't need to know the format of this data, but you do need to know two more things:

git status uses this data to compare HEAD-vs-index, to tell you what will be committed, and to compare index-vs-work-tree, to tell you what you could commit.
There are flags in each entry. The two that concern us here are --assume-unchanged and --skip-worktree.

When git status says that some file is staged for commit, what it really means is: The copy of the file in the index doesn't match the copy of the file in the HEAD commit. When it says that some file is not staged for commit, what it really means is: The copy of the file in your work-tree doesn't match the copy in my index.

If your colleague has some commit X and you extract commit X, you'll get a complete copy of X in Git's index. So all the files will match, including the files you'd like to pretend aren't in X after all. Let's pick one of these files F.

If you now make a new commit, file F will be in your new commit. It is in Git's index right now, and Git will build the commit from Git's index.

When git status says file F is deleted as a not staged for commit change, what it means is: file F exists in the index, but I don't see it here in your work-tree Several operations at this point will remove file F from Git's index, including an explicit git add F. As soon as that happens, git status will tell you that file F's removal is staged for commit.

This is where these two flags come in. Both of them do the same thing, mostly: both of them tell Git: Hey, when you come across file F in the index, don't bother to look at file F in my work-tree. The purpose of these two bits is different: --assume-unchanged is meant for situations in which git status takes too long and you can speed it up by making it ignore some file(s), but --skip-worktree is meant to be used with Git's sparse checkout code. In a sense, the second flag is stronger: a few Git operations won't assume a work-tree copy is unchanged after all, while the skip-worktree flag makes them skip the work-tree copy anyway.

The way the sparse checkout code is meant to work is that you give Git a list of files you do or don't want to go into your work-tree. When git checkout (or git switch) is switching to a commit, it will extract, into its index, all of the files from that commit, but it won't extract, anywhere in your work-tree, some of the files. When it doesn't extract one file, it will, on its own, set the --skip-worktree bit.

If the bit is set, and file F is not in your work-tree, git status won't complain about this. The file is in the HEAD commit and is in Git's index and is not in your work-tree, but Git won't say that there is a deletion that is not staged.

Even if you don't have the sparse checkout code working—and it's kind of klunky, especially in older versions of Git—you can let Git extract file F to your work-tree, then remove file F and set the --skip-worktree bit yourself. The drawback here, of course, is that if you have a file named F, that file will get clobbered in the process. The git checkout and git switch commands will notice that this would happen and will stop with an error unless you have marked file F for .gitignore (so you probably do not want to do that).

If git checkout tells you that file F is in the way, simply move it out of the way, re-run the (non-sparse) checkout, then put your file F back and set the --skip-worktree bit (in either order). The crucial thing is to have the bit set while your file is in place. You can un-set the bit any time your colleague's file F is in place, and when you aren't using sparse checkout and/or don't have the bit set, you can see what they have done with this file.