Git - Push a directory and then ignore it

Question

I have a folder containing all my http requests files that I execute on PhpStorm. I added first this folder to my gitignore file :

/app/http/

Then I forced to push it to my remote repository:

git add -f app/http
git commit -m 'http folder added'
git push

But after that, my folder is not ignored. How can I ignore it after pushed it ?

If you have already pushed it to remote repo then you have to remove that file first alongside correcting your gitignore configuration. — Ardahan Kisbet, Apr 08 '20 at 12:37
But I want to keep my directory on my remote repo. But after my first add, i want to ignore it. If I remove it, it will be removed on remote side too after pushed it. Sorry, i'm beginner on git. — as-dev-symfony, Apr 08 '20 at 12:50
Hımm, I don't know the answer but curios about it. Lets wait someone who is capable of git operations such this. — Ardahan Kisbet, Apr 08 '20 at 12:52

score 3 · Accepted Answer · answered Apr 08 '20 at 13:50

TL;DR

You can't get what you want. Don't do it that way. Consider instead storing one dot-file, such as a .gitignore that lists *, if that's OK. If not, consider having a program that is not Git create the empty folder automatically when needed.

Long

There are a number of incorrect assumptions that you have started with:

You think you added a folder, but a Git commit only stores files, not folders. We'll see what that means, and why Git is like this, in a moment.
You think you pushed the folder and/or the files. Git does not push files. Git pushes commits.
You think that listing something in .gitignore makes Git ignore it. That is not the case.

These all led to the problem you've encountered.

You need to start with this: Git's basic unit of storage—the one you'll interact with, anyway—is the commit. Git is all about commits, so you need to know, in some detail, exactly what a commit is and does.

Commits

A Git commit is made up of two parts: data, and metadata:

The data in each commit consist of a complete snapshot of files. There are no folders here, there are just files whose names may contain embedded slashes, such as path/to/file.ext.
The metadata in each commit contain things like your name and email address, the date-and-time stamp of when you made the commit, and your log message explaining why you made the commit. Included with this metadata—which is meant for you to use, rather than for Git—is some metadata that really is for Git: a list of parent commit hash IDs. Usually there is just one commit hash ID in this list.

Each commit has a unique hash ID. This hash ID is, in effect, the true name of the commit. It's how Git retrieves the data—your files—and metadata. So you need to use commit hashes, so that Git can find the commits. But there is a problem here: commit hash IDs are big and ugly and impossible for humans to work with.

Branch names

The solution to this problem is obvious: we have a computer; we can have the computer store the hash IDs for us, using a simple name, such as master or develop, that humans can handle, to remember the right hash ID. This is what branch names are: a branch name holds the hash ID of one—and only one—commit. The commit we have Git remember in a branch name is the newest or latest commit in the branch, by definition. Git calls this the tip commit.

If we say git checkout master, Git will check out the tip commit of master by using the hash ID stored in the name master. If we add a new commit to the branch, Git will write out the new commit, with its (single) parent hash ID set to the old branch tip. The new commit gets a new, unique hash ID, which Git writes into the branch name, and now the new commit is the branch tip.

The previous commit still exists, and Git can find it on its own: Git uses the branch name to find the hash ID of the last commit, then reads that commit and uses its metadata to find the hash ID of the parent. Git then uses that hash ID to read the parent commit. If appropriate, Git uses that commit's stored parent to go back one more step.

Draw your commits and branches

Visually, if we draw this, we get:

... <-F <-G <-H   <--branch

where branch is the branch name, which stores some hash ID that we'll just call H here. That allows Git to find H. H itself stores, as its parent hash ID, the ID of a commit we'll just call G. So H points to G, allowing Git to retrieve G from its database. G in turn points to earlier commit F, and so on. The whole process works because the branch name points to the last commit, and from there, Git can work backwards.

Your work-tree

Nothing in any existing commit can ever be changed! Once made, every commit is frozen for all time. That includes all of the files stored inside that commit. Because commits are frozen—read-only—they work really well for archival. But the files inside the commit are stored in a special, frozen, compressed, read-only, and Git-only format, that only Git can read. These files literally cannot be changed, in the same way that no Git commit can ever be changed. That makes these files entirely useless for getting any new work done.

The solution to this problem is to have Git extract the frozen committed files, from a commit, to a work area, where the files are turned back into ordinary files that your computer can use. That's what git checkout–or in Git 2.23 and later, git switch—does: you tell it which branch you want to use, Git uses the branch name to find the correct hash ID, and Git extracts all the frozen files from that commit into your work-tree or working tree, which is where you can see them and work with them.¹

¹This glosses over a lot of details about git checkout, so it's just an overview, not a precise definition.

The index

So, as you can see from this picture so far, there are really two sets of files that are active when you're working with a commit: the committed copies, which are frozen for all time (and not really visible at all), and the working copies, which you can see and work on. Some version control systems stop here, with just two copies of each file. Git, however, does not.

Git has something it calls, variously, the index, or the staging area, or sometimes (rarely these days) the cache. The reason it has two (or three) names is probably a combination of things: the index itself is complicated, but the way you use it most of the time is relatively simple. The term staging area refers to the way you use it. I like to call it the index because of the fact that it takes on an expanded role during merges, and you'll eventually need to know about this.

In any case, a good way to think about the index is to pretend that it holds a third copy of each file.² This extra index copy is in the frozen format, but isn't actually frozen: you can overwrite it with a new copy. The copy of each file that is in the index is the copy that git commit will use when it makes a new commit. This means you can view the index as the files that will go into the next commit, or, even shorter, the index is your proposed next commit.

So, what git checkout or git switch does, in effect, is to copy the files from the commit you chose to both the index and your work-tree. The index copy is now ready for a new commit, and the work-tree copy is an ordinary file, stored in a folder because your OS requires that. Git's copies of the file, in commits and in the index, is not stored in a folder at all. They're in a special, read-only, Git-only format.³

What git add does—and this is the reason you must keep using it after modifying a file in your work-tree—is to copy the work-tree copy of the file into the index. That is, it takes your updated work-tree file, re-compresses it and converts it to Git's internal frozen format, and replaces the old index copy with the new copy. If you git add a file that was not in the index at all, that adds a new file to the index, but if you git add a file that was in the index, that just replaces the frozen-format copy.

The index cannot hold a folder name. The files that are in the index just have long names like path/to/file.ext. So when Git builds a commit from the index, the only things in the new commit are files.⁴ That's what prevents you from storing a folder.

When Git goes to extract a commit, if a file in that commit is named path/to/file.ext and your work-tree does not have a path folder, or the path folder is missing a to folder, Git will create path and path/to as needed. So the fact that Git didn't store path and path/to in the first place is not important, except in one way: it's not possible to store an empty folder (empty directory) in a Git commit.⁵

²Technically, what the index holds is not a copy of the file, but rather a copy of the file's mode, its name, and a blob hash ID. Unless you get into the inner workings of the index with git ls-files --stage and git update-index, though, it suffices to think of the index as holding a copy.

³Technically, again, these are actually blob objects, which have hash IDs, just like commits. The hash ID of a blob object depends on the file content, and files can be shared across multiple different commits if the content matches, because each commit really just stores the blob hash ID. This is all hidden underneath another layer of indirection: commits store tree hash IDs, and tree objects store name-and-mode-and-hash ID. But you don't need to know any of this to use Git. You do need to know about commit hash IDs and the index.

⁴Technically, the index can store:

an ordinary file (mode 100644 or 100755), or
a symbolic link (mode 120000), or
a gitlink (mode 160000).

All three of these are associated with a hash ID: a blob hash for a file or symlink, or a submodule commit hash ID for a gitlink.

⁵There are some tricks, none of which are entirely satisfactory: see How can I add an empty directory to a Git repository? The only one of these that really works is the empty submodule trick.

Tracked and untracked files

When you extract a commit, you get three copies of each of its files. One is the frozen copy, in the current or HEAD commit. The second is the frozen-format but replaceable copy, in the index. The third is the only copy you can actually use yourself, in your work-tree. You can see and touch the work-tree copies, as they're actual files, in actual folders, on your computer, instead of internal Git entities stored in some special Git fashion.

Because your work-tree is yours, though, this means you can create and remove files and folders all without Git ever getting involved. Git won't use any of these files until you copy them into Git's index. With git checkout (or git switch and/or git restore), you can command Git to overwrite your work-tree files with copies that Git has, but it's the index copies that will go into the next commit.

So what happens if you make a work-tree file and don't add it to the index? The answer is: Git calls that an untracked file.

An untracked file is, very simply, a file that is in your work-tree right now, but is not in Git's index right now. The right now part of this is crucial, because you can not only copy new files into Git's index—with git add, of course—you can also tell Git to remove files from its index, with git rm.

Running:

git rm path/to/file

tells Git: remove the copy of that file from both your index and my work-tree. Now that it's gone from both, it won't be in the next commit. Running:

git rm --cached path/to/file

tells Git: remove the copy of that file from your index, but don't touch the work-tree copy. Now that it's gone from the index, it won't be in the next commit.

So, if you want some file to not be in a commit, you must remove it from the index. That's fine as far as it goes. If you remove it from the index, or never had it in the index in the first place, and the file exists in your work-tree right now, that file is an untracked file.

Untracked files can be annoying, as we'll see, but may also be pretty important.

`git status`

When you run git status, Git runs two sets of comparisons. The first comparison compares the HEAD commit to the index. For every file that is the same, Git says nothing at all. For files that are different—or are new or have been deleted—Git says that this file is staged for commit. Note that you cannot change the content of the HEAD commit,⁶ so any change here is, by definition, something you changed in the index.

Having run this first comparison, HEAD-vs-index, Git now runs a second comparison. It compares all the files in the index to the files in your work-tree. For every file that is the same, Git says nothing at all. For files that are different, or have been deleted, Git says that this file is not staged for commit.

Notice that we didn't mention files that are new yet. For every file that is in your work-tree, but not in your index ... well, those are exactly your untracked files. What git status does about these is complain, by listing them as "untracked".

⁶While you cannot change the content of any commit, you can—with git checkout—select a different commit to be the HEAD commit. Or, of course, you can run git commit and make a new commit, and now that new commit is the HEAD commit.

Making `git status` shut up

About half of the purpose of a .gitignore file is to make git status stop complaining about an untracked file. Listing a file name, or pattern, in a gitignore file makes git status not complain that the file is untracked.

This doesn't take any files out of the index. If a file is already in Git's index, listing that file in .gitignore has no effect. The file is in the index, so that copy of that file will be in the next commit ... unless, of course, you remove it yourself, with git rm.

Avoiding automatically-adding files that you don't want committed

The rest of the purpose of a .gitignore file is to enable you to use en-masse git add operations, without also copying some files that are currently untracked into Git's index. Listing, for instance, *.o or *.pyc means that you can now git add . or git add *. Git will skip over the untracked-and-ignored files, while copying other updated or new work-tree files into Git's index.

Untracked-and-ignored is therefore useful in two ways

Since those files don't get copied into the index, they won't be in the next commit. Running git status with these files both untracked and ignored, you won't see them mentioned at all: they're not in the index, so they are untracked, but git status won't complain. Running git add . with these files both untracked and ignored, git add won't copy them into the index.

But you already have the files in some existing commits

In your case, there are some commits that have some files somewhere within app/http/. When you run:

git checkout <commit-specifier-or-branch-name>

and the commit you choose by this action is a commit that contains a file such as app/http/foo.html, Git will:

create app/http/ if needed
write foo.html into that folder
copy the frozen copy of app/http/foo.html into its index

and you'll be ready to work on / with that file. But now the file is in Git's index, so it is tracked and git status will tell you about it and any en-masse git add will copy the work-tree file into the index, so that the updated foo.html will be in the next commit.

Even if you don't copy foo.html into Git's index, so that the index copy remains the old copy, that old copy will be in the new commits you make. Git makes commits from the index, and the index has the old copy.

The obvious cure is to remove the index copy entirely, then make a new commit. Now that the index copy is absent, there's no app/http/foo.html in the new commit. If app/http/ is in .gitignore, Git won't, from this point on in new work, add foo.html either.

But if you do this for all app/http/ files, then, in the future, cloning this repository and extracting this commit to a new, otherwise-empty work-tree, you won't get a folder named app/http, because there will be no files whose name starts with app/http/ to make Git notice that it needs to create the app folder and then the http folder within the app folder.

Moreover, suppose you now have a state in which you have commit a123456... (some hash ID anyway) that does have app/http/foo.html in it, and commit b6789ab... that doesn't have app/http/foo.html in it. If you:

git checkout a123456

Git will extract all the files from that commit, including app/http/foo.html, into your work-tree and into Git's index ... and then, if you:

git checkout b789abc

Git will extract the files from that commit, notice that app/http/foo.html needs to be removed because it was in the previous commit—and thus is in the index now—and isn't in the commit you're selecting now, and hence must be removed from the index. So Git will remove app/http/foo.html from the index and from your work-tree.

Listing app/http/ in the top-level .gitignore, or * in app/http/.gitignore, not only tells Git that it should not automatically add these files to the index, it also gives Git permission to destroy these files in the work-tree in some cases, including this particular one. So the fact that you have these files in some existing commits makes things dangerous if you select to .gitignore them. After you've made sure all new commits don't have, and do ignore, these files, be careful when checking out historic commits!