Git add and commit a directory that's already been tracked

Question

For some reason I have a directory that has already been tracked, added and committed but there are a number of files in it that haven't been pushed through. I've checked my .gitignore file to see if that had caused the issue, but there's nothing amiss.

What I'd basically like to do is push the entire directory and its contents through again and overwrite everything in that directory in the repo.

What's my best option? Should I untrack and then retrack it? Could I simply rename it, commit and then rename back? Or should I delete the entire directory and then add it back again?

Is this here answering your question? (https://stackoverflow.com/questions/9034999/why-isnt-git-tracking-changes-in-a-subdirectory#:~:text=The%20solution%20is%3A%20renaming%20the,remote%20of%20that%20subfolder%20first.) — Mafick, Jul 25 '20 at 08:32
Thanks. I renamed the main folder, added and commited. And renamed it back. Somehow it only commited the folder and subfolders but not the files contained therein. In the end I just trashed the folder. Committed and then recreated it. And thaT worked. — Seb, Jul 25 '20 at 09:54
not an insightful question: You suggest the solution, test it after posting, notice that is working, nothing is said about how to reproduce it, nor analysis is performed to understand why it happened — Daemon Painter, Jul 25 '20 at 10:33
@DaemonPainter, half true but a tad hypercritical. I had a few ideas in mind which I had outlined prior to attacking the issue – one of which worked. But before tackling it, I wanted to scope out what the recommended route to take from others who'd know more about it than me. Nothing wrong in that, no? Regarding analysis and thorough testing: maybe it's because it's Saturday morning, maybe I've got a deadline to just get it working and get on with everything else, or maybe it's because I switched repos recently. Whatever is the case, I'm grateful to Mafick for his time and suggestion. — Seb, Jul 25 '20 at 10:54

score 1 · Answer 1 · answered Jul 26 '20 at 02:54

TL;DR

It's not really clear what the problem was, although I suspect you were hit by a file-name-case issue somewhere. On typical Windows and macOS systems, the working tree or work-tree cannot hold one file that is named README and a second file that is named readme. Git has some level of understanding of this problem, but Git itself can store two files with these two names, and the way it deals with OSes that cannot is sometimes less than satisfactory.

If you remove everything, any files whose name exists but is in the wrong case (e.g., is named readme when you wanted README) vanishes. These OSes will obey the case you give them when you first ask them to create a new file, so if you now ask them to create README, that's the name you (and Git) will see. If you have the right set of names directly in Git, though, you can just remove the work-tree (carefully, so as not to remove the .git repository directory within it) and then re-check-out the commit that has the right set of names. Note, however, that this strategy won't work if the commit you want needs two separate files with names that your OS insists name one single file.

Long

You've started with a bad assumption. In particular, Git does not track directories, ever.

The definition of tracked, in Git, is this: a file is tracked if and only if that file is currently in Git's index. The index is not capable of holding a directory name: it only holds file names. That's why a directory cannot be tracked. Unless you know about Git's index, though, this is just mysterious and unhelpful words. So we'll cover that soon.

What I'd basically like to do is push the entire directory and its contents ...

What you want to achieve is possible, but the way you are thinking about it is not how Git works. In particular, git push pushes commits. While commits do contain files, you either get the entire commit—which means a full snapshot of every file—or nothing at all.¹ The real trick, then, is determining what's in a commit, what commits you have, and what new commits you want to make.

In other words, Git is all about commits. Branch names don't really matter to Git (with one exception that we'll get to in a moment). Of course, they do matter to humans, so they're pretty important. But Git goes by commit hash IDs, those big ugly strings of letters and digits that seem random (but aren't).

¹Remember that during commit-transfer operations, there are two Gits involved: a sender and a receiver. The sender tells the receiver about the commits it would like to send, by their hash IDs. The receiver tells the sender yes, send that as I don't have it or no, don't send that one, I already have it. Because most new commits simply re-use the existing files from a previous commit, the sender can now reduce the commits that are to be sent down to just a few files: the receiver will restore the missing parts from what it already has. But in principle, the sender is sending "the entire commit": it's just optimized.

Commits and Git's index

This thing that Git calls, variously, the index, or the staging area, or sometimes—rarely these days—the cache, is crucial in Git, and is poorly covered by some Git introductions and tutorials. It actually has multiple roles and we won't cover all of them. For our purpose here we'll cover the main use. At all times, the index holds a snapshot of all the files that you will put into the next commit you will make.

To understand why this is the case, consider these facts about commits:

Each commit is numbered, with a unique random-looking hash ID. The hash ID is actually a checksum of the contents of the commit, which is exquisitely sensitive to every bit. So if you take a commit out of Git, manipulate it to change some bits, and put it back, you don't change the commit. Instead, you wind up making a new, different commit with a different unique ID.
Thus, every commit is totally read-only. You can add new ones any time you like, but you cannot change an existing commit. You can't even delete one: you can just stop using it entirely, and eventually Git will realize that nobody and nothing wants it, and will "garbage collect" it, but there's no explicit way to do that.
Each commit holds a full snapshot of every file, in a compressed, de-duplicated, read-only, Git-only form. So making a new commit that mostly re-uses most existing files doesn't really take much space at all: the new commit just re-uses the earlier files, and has compressed ones for those that are actually different. (This trick breaks down when using large, non-compressible binary files.)

(Each commit also holds metadata, in addition to the snapshot, but we aren't going to cover that here.)

Therefore, to actually use any existing commit, Git has to extract the commit to a work area.

This design means that Git needs, at a minimum, two copies of each "active" file right after a git checkout or git switch. There is a frozen one in the commit, and a usable copy in your work-tree.

Git could make new commits from the work-tree files ... but it doesn't. This is where the index gets in the way. Inside the index, there's a copy² of the frozen-format file, but it's not frozen there like it is in the commit. Instead, when you first extract the commit, you wind up with three copies (see footnote 2 again): one in the commit (frozen), one in the index (frozen format but replaceable), and one in your work-tree (usable).

When you edit a file and then run git add on it, this tells Git to copy the work-tree copy back into the index, overwriting the index's copy. Git puts the file into the frozen, de-duplicated format, in the index. It's now ready to be committed. It was ready before, though—it's just that what was ready was the previous version of the file. Likewise, if you have an all-new file, that is not in the index yet, git add copies the file into the index, putting it into the frozen and de-duplicated format.

This means that git commit merely needs to package up the frozen-format files into the new commit. This makes Git's job easier, and git commit go faster. It also gives you the ability to commit something other than what you see. The cost, of course, is that what you see is not what you're committing. This is where git status comes in.

`git diff`, `git status`, and untracked files

Git builds each new commit from whatever is in Git's index / staging-area. But there's no easy way to see what's in Git's index / staging-area.² So how do you know what you are about to commit? The answer is to use git status. Before we look at how git status works, let's take a very fast look at git diff:

You can give git diff the names³ of two commits, and Git will, in effect, extract the two commits into a temporary area (in memory, really) and compare each file in the two commits. Let's call the first commit in your git diff left right command the left-side commit, and the second commit the right-side commit.
Now, for each file that exists on both sides, Git compares the two files. If they are the same, Git says nothing at all about the files. If they are different, Git figures out what it takes to change the left-side file into the right-side file, and shows that to you.
Or, if you like, with git diff --name-status, Git just prints the name of the file, prefixed by the letter M for Modified.
For a file that exists on the left, but is gone on the right, Git says that the left-side file is deleted. With --name-status this is status D.
For a file that exists on the right, but not on the left, Git says that the right-side file is added, or status A.

There are some special cases that we don't really need to go over here. The point should be clear enough: Git compares the left and right sides, and for different files, tells you about the difference.

Now, what git status does is, in effect, to run two git diff commands:

The first one compares the current commit—the one you checked out earlier—to the files that are in Git's index. For every file that is the same, Git says nothing at all. For files that are deleted, modified, or added, Git prints the file's name and says that the add, modify, or delete is staged for commit.
The second git diff --name-status compares the files in Git's index to the files in your work-tree. For every file that is the same, Git says nothing at all. For files that are deleted or modified—but not added files—Git prints the file's name and says that the add or modify is not staged for commit.

The oddball case here is files that are present in your work-tree, but are not in Git's index. Why doesn't Git say that these files are added? Perhaps only Linus Torvalds could say for sure, but it doesn't: it says that these files are untracked.

That's all that untracked means: the file is there in your work area, where you can see it, but it's not in Git's index right now. Since it is not there, it will not be in the next commit you make.

Files that are untracked can be ignored as well. Files that are tracked—that are in Git's index—cannot be ignored. Listing a file in .gitignore tells Git to shut up about it when it is untracked, but has no effect on it when it is tracked.

Note that although the index can only hold files, not directories,⁴ you can list a directory name in a .gitignore file. This tells Git to ignore all the files within that directory (unless, of course, they're already in Git's index, which means they're automatically not ignored).

In any case, the point here is this: using git status, Git will tell you where the index is different from the current commit, and where the index is different from your work-tree. So you can look at a work-tree file, and if Git says that the index and work-tree file match, what you see is what will be committed.

If the two are different and you want to see what the difference is, use one of the many varieties of git diff. Remember that there are three copies of each file: one in HEAD, one in Git's index, and one in your work-tree. You can compare any two of these three.

²To see the names of all the files in the index, plus more details, use git ls-files --stage. But this isn't really useful for humans. Run without --stage, you'll get just the file names ... which still isn't really useful. Use git status instead.

³The "true name" of a commit is its hash ID, but there are a lot of ways to have Git find the hash ID from some other name. These are described in the gitrevisions documentation, which is worthy of repeated viewing. There is a great deal of information packed in here.

⁴It might be nice if the index could hold directories, as this would enable Git to store an empty directory. But for all of Git's existence so far, it has not been able to do this.

Yes, I remembered that git doesn't track directories until there's a file withheld in them. I had indeed run `git status` on both the local and remote copy, and oddly it showed everything was up to date. I then checked dates and byte sizes on a number of files held within subdirectories via FTP and it showed a very different story. A total mess. Since there were too many files held in many different subdirectories, I concluded it was best to replace the "whole" directory (and obviously its contents). Setting a copy aside, trashing the tracked one and then copying the backup back in solved it. — Seb, Jul 27 '20 at 08:29
And thank you also for your time and the detailed explanation. — Seb, Jul 27 '20 at 08:30

Git add and commit a directory that's already been tracked

1 Answers1

TL;DR

Long

Commits and Git's index

git diff, git status, and untracked files

`git diff`, `git status`, and untracked files