11

I basically know the difference between a git add which means "I want to add this file to my next snapshot" and git commit which means "take the snapshot".

However, when I run git add file1 and then removing file1 from my working directory and then run git commit it will still works. Somehow the snapshot was taken while adding and not not while commiting. Am I right?

LivBanana
  • 361
  • 2
  • 9

3 Answers3

8

git commit takes the snapshot by

  • looking at the index (where you have added the file),
  • not by looking at the working tree (where you go on modifying stuff, including adding or deleting files)

See "What's the difference between HEAD, working tree and index, in Git?"

In your case, after deleting the file (but after adding it to the index), a git status would give you:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   go.mod

Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        deleted:    go.mod

The file is both:

  • ready to be part of the next commit
  • locally deleted

A git restore -- <myFile> is enough to restore it locally.


The working tree (or working directory) is the tree of actual checked out files.
The working tree normally contains the contents of the HEAD commit’s tree, plus any local changes that you have made but not yet committed.


The idea is to prepare your next commit, instead of blindly putty all your current modification into one giant commits.
It is better to make small coherent commits instead of a giant one, for getting a logical history, and making future git bisect easier.

You can even stage (add to index) part of a file (interactive staging)

The OP adds:

Imagine that commit does the work of both the actual commit and add.
Let's call it the imaginary commit.
You can still do this little by little work using the imaginary commit

First: that command (which adds all and commit) does exist:

git commit -am "Let's add everything"

Second, to do "little by little", you must use git add, then commit.
A commit takes everything in the index.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • 1
    Is the working tree = the working directory ? – LivBanana Nov 28 '20 at 21:29
  • 1
    @Giant8 Yes, those terms reference the same concept – VonC Nov 28 '20 at 21:32
  • Thanks ! I am still wondering however what's the real utility to have two different commands, why not just have only one that do both at the same time. Is there some useful cases when someone wants to use `add` without `commit` later ? – LivBanana Nov 28 '20 at 21:36
  • @Giant8 I use it all the time. When you do a *lot of* modification, the last thing you want to make is one *giant* commit. You add only a small coherent subset, make a first commit, add the rest, little by little, making small commits along the way. You can even add *parts of a file* instead of the all file, if you have made multiple modifications inside the same file. See "interactive staging": https://git-scm.com/book/en/v2/Git-Tools-Interactive-Staging – VonC Nov 28 '20 at 21:41
  • Imagine that commit does the work of both the actual commit and add. Let's call it the imaginary commit. You can still do this little by little work using the imaginary commit – LivBanana Nov 28 '20 at 21:46
  • 1
    @Giant8 I have edited my answer to address your comment. "little by little" means first adding that "little" in the index, then committing. – VonC Nov 28 '20 at 21:50
  • If commit take everything as you said then what I really need is to commit several times to work little by little (and not just to add several times). In other words, If I add several times without using a commit each time and run a whole big commit after all the adds I will still have the same problem of big chunks. What really matters is committing several times, no relation to add here. – LivBanana Nov 28 '20 at 22:18
5

Actually, there is something missing from what you know.

You actually have two copies of that file when you have added it. You have the working tree copy, which is the normal file system copy that you see and edit with normal text editors and whatnot.

But, additionally you have a copy in the index. git add copies the file and its contents from the working tree into the index. This is where the actual snapshot of that particular file is made.

When you afterwards issue a git commit, the index is stored into a commit. What is, or is not, in the working tree (aka on disk) at this point is rather irrelevant. The index is all that matters.

This is why you see that the file is still being added. It was copied to the index with git add, and even if you subsequently removed it from disk, git commit used the index as the source of the commit.

The upshot of having a separate index which makes up what the next commit is going to be is that you get to decide what your next commit is going to contain, as opposed to just "all the things on my disk at the moment". Good git tools even lets you copy files with only parts of their changes into the index, so that if you made 2 or more changes to a file you get to decide if all the changes to that single file go into the next commit or just one or a few of them.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • The working tree is another name for the working directory, am I right ? – LivBanana Nov 28 '20 at 21:32
  • Yes, though I believe the established git terminology is "work tree". But basically it is your files and folders on disk, part of the normal operating system workflow, as in what you would have if you simply ignored the entire git part completely. "I believe" means I'm not 100% sure, however. – Lasse V. Karlsen Nov 28 '20 at 21:34
  • Thanks ! I am still wondering what's the real utility to have two different commands, why not just have only one that do both at the same time. Is there some useful cases when someone wants to use `add` without `commit` later ? – LivBanana Nov 28 '20 at 21:36
  • @Giant8 No, but the opposite. You might commit without adding all the changed files, because you have actually done two or more *separate* changes. For instance, you might have refactored a class to better separate the responsibilities in it, as well as adding support somewhere else for reading data from a database. Those two things might be good to commit as two commits, with appropriate comments, instead of lumping them together. So you might want to add only some of the files before the commit, and then add the rest afterwards, and then so on. – Lasse V. Karlsen Nov 28 '20 at 21:38
  • I also added a paragraph at the bottom of my answer, which is another situation where you might want to make multiple commits, you have actually fixed two bugs in the same file. So you use some tool to add only some of the changes to the file into the index, make the commit, then add the rest and make another commit. I do this all the time to make it clearer for future readers and investigators of the repository. – Lasse V. Karlsen Nov 28 '20 at 21:40
  • Imagine that commit does the work of both the actual commit and add. Let's call it the imaginary commit. You can still do this "little by little" work using the imaginary commit – LivBanana Nov 28 '20 at 21:49
  • 1
    If your imaginary commit just commits the current state of your working folder, then sure, you just need to make your changes in smaller sections and commit inbetween. You can do a `git commit -a ...` which will automatically *add* all changed files before the actual commit, in essence combining `git add` and `git commit` into one command. – Lasse V. Karlsen Nov 28 '20 at 21:58
2

You are absolutely right. Not only that, but the file is kept as it is at the moment when you add it. If you later change it and then commit, it's saved in the revision as it is at the moment when you asked for it to be added, not the current state. Same things happen for files that are modified (just in case you think it only works for new files). If you commit a file and you modify it, git will only persists it with the new content for the following revisions if you add it.

eftshift0
  • 26,375
  • 3
  • 36
  • 60
  • So what's the real utility to have two different commands, why not just have only one that do both at the same time. Is there some useful cases when someone wants to use `add` without `commit` later ? – LivBanana Nov 28 '20 at 21:35
  • Actually? Yes.... You can modify dozens of files, and then you get to choose what you _really_ want to commit. What's there to complain about? The index (what will be used for the actual revision when you commit) is one of the wonders of git. Just in case, you can always use `git commit -a`, but I (for one) am not fond of it. I like to select the pieces that I want to commit. – eftshift0 Nov 28 '20 at 21:40
  • There's one more workflow that this allows. You can _add_ stuff that you think is ready... and then you get the luxury of modifying files to see if you can improve them. If you do improve them, you add again.... if you don't, well, no need to do anything else, just commit (which will use what you had added before). – eftshift0 Nov 28 '20 at 21:41
  • 1
    My advice: give yourself some time to use git extensively so that you get to see all the different crazy situations that coding will get you through and see how git is able to cope with them. Given enough time, it will all make sense. – eftshift0 Nov 28 '20 at 21:44
  • Is the SHA-1 created during the `add` process ? – LivBanana Nov 28 '20 at 22:45
  • 1
    All objects that are put in git's DB get an ID (currently using SHA-1 but git will slowly move to SHA-256 over time... process has already started). When you add the file, the content is put in the object DB and an ID will be used to point to it, the file is also added in the index (probably under a tree object so some tree objects will also be created, each with their own IDs). When you _commit_, a new revision object will be created and it will also get its ID at that moment. – eftshift0 Nov 28 '20 at 23:06
  • Any resource to go dip into the internals ? I would be grateful if you could share some links – LivBanana Nov 28 '20 at 23:32
  • This is great: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects – eftshift0 Nov 29 '20 at 01:24