0

Looking for the 'why' to the following situation which resulted in unexpected behavior - specifically using the unix command 'rm' to delete a file on a project branch in my git repository also deleted the file from the master branch. Below I give the summary of the commands and then the complete console.

Summary of the commands:

  1. git init
  2. touch file1.txt file2.txt
  3. git add *.txt
  4. git commit -m "Add file1.txt and file2.txt"
  5. git checkout -b myBranch
  6. rm file1.txt
  7. git status => shows deleted file1.txt not staged for commit
  8. git checkout master
  9. git status => shows deleted file1.txt not staged for commit
  10. ls => shows file1.txt has been deleted from the directory (master)
  11. git checkout myBranch
  12. git rm file1.txt
  13. git commit -m "Remove file1.txt"
  14. git checkout master
  15. ls => shows the file1.txt is in the directory (master)

Points of concern from the above summary: lines 9, 10 file removed in master, but back at line 15.

Console detail (Note, may have additional display entries)

ec2-user:~/environment/TestGit $ git init
Initialized empty Git repository in /home/ec2-user/environment/TestGit/.git/

ec2-user:~/environment/TestGit (master) $ touch file1.txt file2.txt
ec2-user:~/environment/TestGit (master) $ git add *.txt
ec2-user:~/environment/TestGit (master) $ git commit -m "Add file1.txt and file2.txt"
[master (root-commit) 531ed48] Add file1.txt and file2.txt
 Committer: EC2 Default User <ec2-user@ip-172-31-37-27.ec2.internal>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly:

    git config --global user.name "Your Name"
    git config --global user.email you@example.com

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 file1.txt
 create mode 100644 file2.txt

ec2-user:~/environment/TestGit (master) $ git checkout -b myBranch
Switched to a new branch 'myBranch'

ec2-user:~/environment/TestGit (myBranch) $ ls
file1.txt  file2.txt

ec2-user:~/environment/TestGit (myBranch) $ rm file1.txt
ec2-user:~/environment/TestGit (myBranch) $ ls
file2.txt

ec2-user:~/environment/TestGit (myBranch) $ git status
On branch myBranch
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    file1.txt

no changes added to commit (use "git add" and/or "git commit -a")

ec2-user:~/environment/TestGit (myBranch) $ git checkout master
D       file1.txt
Switched to branch 'master'

ec2-user:~/environment/TestGit (master) $ ls
file2.txt

ec2-user:~/environment/TestGit (master) $ git status
On branch master
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    file1.txt

no changes added to commit (use "git add" and/or "git commit -a")

ec2-user:~/environment/TestGit (master) $ git checkout myBranch
D       file1.txt
Switched to branch 'myBranch'

ec2-user:~/environment/TestGit (myBranch) $ git status
On branch myBranch
Changes not staged for commit:
  (use "git add/rm <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        deleted:    file1.txt

no changes added to commit (use "git add" and/or "git commit -a")

ec2-user:~/environment/TestGit (myBranch) $ git rm file1.txt
rm 'file1.txt'

ec2-user:~/environment/TestGit (myBranch) $ git status
On branch myBranch
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

        deleted:    file1.txt

ec2-user:~/environment/TestGit (myBranch) $ git commit -m "Remove file1.txt"
[myBranch 6585980] Remove file1.txt
 Committer: EC2 Default User <ec2-user@ip-172-31-37-27.ec2.internal>
Your name and email address were configured automatically based
on your username and hostname. Please check that they are accurate.
You can suppress this message by setting them explicitly:

    git config --global user.name "Your Name"
    git config --global user.email you@example.com

After doing this, you may fix the identity used for this commit with:

    git commit --amend --reset-author

 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 file1.txt

ec2-user:~/environment/TestGit (myBranch) $ git status
On branch myBranch
nothing to commit, working tree clean

ec2-user:~/environment/TestGit (myBranch) $ ls
file2.txt

ec2-user:~/environment/TestGit (myBranch) $ git checkout master
Switched to branch 'master'

ec2-user:~/environment/TestGit (master) $ ls
file1.txt  file2.txt
  • 1
    What's wrong? Both the recipe you described and what you are showing on the console is just the way it is expected to be. – eftshift0 Mar 17 '20 at 17:10
  • 1
    step 14 restores the working dir to match `master`. The files are in `master`, so the checkout creates them. This is expected. – William Pursell Mar 17 '20 at 17:11
  • 1
    lines 9 and 10 do not "remove the files from master". They remove the files from the working directory. In line 13, you remove file1 from MyBranch. You never remove them from master. – William Pursell Mar 17 '20 at 17:13
  • but the files were originally created and committed to the master (2, 3, 4) and then MyBranch created. The files were deleted from MyBranch (6) and then I switched back to master (8). It is (9) and (10) which show the file is no longer a part of Master. This can be seen in the console provided – SParker Mar 17 '20 at 19:09
  • @SParker the output you provide shows exactly the opposite. When it says `deleted: file1.txt` it is telling you that `file1.txt` has been deleted from the working directory, but that it is still in the master branch. – William Pursell Mar 17 '20 at 20:10
  • It's not like you did something on master. At that point the file is _uncommited_, the removal is in the staging area. Had you committed on master then you would be right to think that the file should be gone from master. Let me.put it this way: you can't commit and get _two_ branches to move in a single commit operation. – eftshift0 Mar 17 '20 at 20:12

2 Answers2

0

Ok... I kind of think I understand what you don't understand. On points 9 and 10 you were playing around on master.... however, you then switch back to myBranch and that is when you commit. So... master stayed right where it was (the first commit, with the two files, it doesn't move) and you finally committed on myBranch with the file removal. That's why when you move back to master both files are there.

eftshift0
  • 26,375
  • 3
  • 36
  • 60
  • Yes, but if you follow the console, you'll see that I created the file on the master, made a new branch, deleted the file on the new branch, then before doing anything else I switched back to the master and did a 'git status' and an 'ls' and the file is missing on the master. – SParker Mar 17 '20 at 19:14
  • Yes, but _you didn't commit_ on master. The revision was created on myBranch and that's where the file is deleted. master stayed where it was and that's why the file is there if you switch to it. – eftshift0 Mar 17 '20 at 20:09
  • But that's my point. after #9 and #10 where I have checked out the master again, I should still see file1.txt, but it's not there. – SParker Mar 18 '20 at 17:55
  • That happens because you haven't committed. It's very frequent to want to move to different branches with the changes laying around. Git does the best it can: checks the files that are modified in index and working tree to see if they are in HEAD as they are in the revisión where you are heading. If they are the same, it allows you to do it with the changes uncommitted. That's why the file is seemingly not there (it's removed.... but **on index**). It would be more problematic if git were supposed to go to master without any changes. It will have to ask you to stash or whatever. – eftshift0 Mar 18 '20 at 19:06
0

You have an incorrect mental model of how Git works. (Don't worry that you do—I did when I started with Git, more than a decade ago.) To correct your mental model, you need to know these things:

  • Git stores commits. It does not store files—not at the level you will use it, anyway—but rather whole commits.

  • Commits themselves do store files, so that's how you get files, but it's at the level of a commit: you either have a commit (and all of its files), or you don't (you have none of its files). Every commit stores a full and complete snapshot of all files (well, all of its files; see below).

  • Commits also store some metadata: information about the commit, such as who made it, when, and why (a log message). A crucial piece of metadata in each commit is the commit-"number" of the commit that comes before this commit.

  • Commit "numbers" are big and ugly and random-looking hash IDs. Every commit gets a unique hash ID. This is how you (or your Git) knows whether you have the commit. Every Git everywhere agrees that that particular commit gets that particular hash ID, and no other commit, past or future, can ever have that ID. To make this work, the hash ID is a cryptographic checksum of the contents of the commit—which means that no part of any existing commit can ever change.

  • No human can actually remember these hash IDs. Fortunately, we don't have to: we have a computer to remember them for us.

  • A branch name, which most people (including me) will often abbreviate to "a branch", holds just one hash ID. The hash ID in a name like this is the ID of the last commit in the branch. That's why each commit links back to its parent, or previous, commit: so that Git can start at the end and work backwards.

  • A collection of commits that you get by starting at the end and working backwards is also called "a branch". So when someone says branch master, for instance, it's important to think about whether this means the last commit in master as stored in the name master or a series of commits ending with the last commit in master.

Now, the fact that every commit ever made is read-only means that what we do with a repository is generally just add new commits. But to make a new commit, we have to be able to change files: open them up in our editors, make changes to them, and save them back. The files inside commits can't be changed. So we do not, and cannot, work on committed files. The commits themselves, that hold snapshots of all of your files, are just archives.

To keep the archives from growing very fat very fast, Git stores committed files in a special, read-only, Git-only, compressed format. Only Git itself can actually use these. (You could of course write your own programs to read them, but there's more than one format, and there's already a Git plumbing command, i.e., something users aren't supposed to have to use, to read a raw object, using git cat-file -p. This can read more than just files, but it can read the files inside a commit.) New commits can share the files from existing commits—that's obviously safe because they're all read-only—and in fact, this all happens automatically.

In any case, to get any new work done in some existing repository, you must first pick some existing commit and have Git extract it somewhere. That "somewhere" is your work-tree (or working tree or some variant on this name). The extracted work-tree area contains ordinary files, in ordinary everyday formats.

You, and your computer, can work with these work-tree files. That's what you are doing in your steps 2 and 6, for instance.

Git does not use these work-tree files very much at all. It creates them for you (by extracting them from commits), and it will look at them when you tell it to, but it's not using them to make commits. They exist for you to use, to get your work done. You have to copy them to the files that Git is using, which is what step 3 was about. This is where everything gets a little complicated.

The index

In step 1, you created a new, empty Git repository. This repository has no commits yet. It has an empty work-tree, in which you can work with your files. And, it has an empty index. This thing—this index—is kind of complicated, but you can think of it as where you build the next commit you will make. You can think of it as holding copies of each of your files.

Your step 2 was:

touch file1.txt file2.txt

which created two (empty) files in your work-tree. These files are not in your index yet. Your step 3, though, was:

git add file1.txt file2.txt

This has the effect of copying the files' contents into the index.1 Git now says that these files are staged for commit. This leads to another, alternative name for the index: it's also called the staging area. These are just synonyms: the index, or the staging area, is just one thing.2

Finally, in step 4, you ran git commit. This made a new commit from the files that were in the index, not the ones in the work-tree. Those two index files were copies of the ones from the work-tree.

At this point, you now have a commit. This one commit is the very first commit in the repository, so it's a bit special: it does not record any previous commit. (It can't, of course; there are no previous commits.) I have no idea what hash ID your commit got: it depends not only on the files that are in the commit (which I do know) and your log message (which I saw in your command), but also on your name and email address and on the very second at which your Git created the commit (and I don't know these). I do know, though, that it has a unique hash ID, different from all the other hash IDs in your repository, or any other Git repository you'll have your repository talk to in the future.3


1Technically, the index holds the files' modes, their names, and—for each file—a reference to the internal Git object that holds the content. This blob object has a hash ID, like a commit (though unlike a commit, a blob object can be re-used). The hash ID of the empty file is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391, which you can find by running git hash-object -t blob --stdin </dev/null. If and when Git moves to SHA-2 instead of SHA-1, the IDs of every object will change, which is going to be a very interesting time for Git. We can hope that Git hides all the painful parts here for us.

2Technically, the index is mostly just a file in .git named .git/index. The "mostly" is here only because Git has a mode called a split index. All of these, however, are internal details that could change. The one external promise is that you can set an environment variable named GIT_INDEX_FILE to make Git use a different index. Some Git programs do this for special purposes: e.g., git stash, when it was a shell script, did it when making some of the stash commits, to avoid overwriting the normal index.

3This depends on the uniqueness of hash IDs. In the presence of malicious actors, that in turn depends in part on the strength of the cryptography. See How does the newly found SHA-1 collision affect Git?


More about branch names

We already mentioned that branch names, like master, hold the hash ID of a commit. Until you have some hash IDs, you can't have any branch names. So creating this initial commit is what created the name master. This name holds the actual hash ID, whatever that is. When something holds a hash ID, we say that this something points to the commit. So at this time—after step 4 creates the first commit—you have a commit with some big ugly hash ID, but let's just call it "commit A", and draw it like this:

A   <-- master

The name master points to (contains the hash ID of) commit A.

Now we go on to step 5:

git checkout -b myBranch

This creates a new name, myBranch, that also holds the hash ID of existing commit A. Let's update our drawing:

A   <-- master, myBranch

Git also needs to know which branch name we're using, so let's attach the name HEAD, written in all uppercase, to one of these two branch names. The branch name we want to use—created by this git checkout -b—is the new one, so that's:

A   <-- master, myBranch (HEAD)

Both names point to the same commit. This is perfectly normal in Git: commit A is now on both branches. The current name is myBranch and the current commit is commit A.

Now let's watch what happens in steps 6, 7, and 8:

  1. rm file1.txt

    This removes the file from your work-tree. Git's index, which still matches commit A—Git made commit A from the index—still has two files in it.

  2. git status

    This runs two separate comparisons. One compares the current commit, commit A, to the index. These have the same files with the same contents, so this part of git status says nothing. The second comparison is index-vs-work-tree. Here, the index has file1.txt and the work-tree doesn't, so this comparison says that file1.txt is removed from the work-tree but not from the index, by saying that this deletion is not staged for commit.

  3. git checkout master

    This tells Git that you'd like to change the current commit and/or branch. The current branch is myBranch and the current commit is A. The selected branch name is master and its commit is A. So Git can skip changing commits, while sticking the special name HEAD to the name master now:4

    A   <-- master (HEAD), myBranch
    

Nothing has happened anywhere else: the index still has two files, the current commit is still commit A, and the work-tree still has one file missing. Step 9—another git status—will tell you that your current branch is now master, but will do the same comparisons: commit A vs index, and index vs work-tree. The result here will be the same. Step 10 just looks at the work-tree, which we know is missing file1.txt.

Step 11 asks Git to attach HEAD to master again. Nothing else changes: the index is untouched, and the work-tree is untouched.

In step 12, though, you run:

git rm file1.txt

This changes the index. The git rm command removes the file from both the index and the work-tree. It's already gone from the work-tree, so that doesn't really change anything, but now the index no longer has a file1.txt in it.

In step 13, you run git commit again. This makes a new commit, from what's in the index: that is, a commit that has just the empty file2.txt in it. You get all the usual metadata as well: your name and email address, and the log message for why you made this commit. The parent of this new commit, which we'll call B rather than trying to guess a hash ID, is existing commit A: new commit B points to existing commit A.

The last step of git commit is for Git to write the new commit's hash ID into the name to which HEAD is attached. Since step 11 attached HEAD to myBranch, the result is this:

A   <-- master
 \
  B   <-- myBranch (HEAD)

The existing name master has not changed at all. HEAD is still attached to myBranch, but the name myBranch now points to new commit B. The index still has whatever it had from before you ran git commit: i.e., it has just the empty file2.txt in it. Commit B has a backwards-pointing arrow to—or really, contains the hash ID of—commit A, so if you run git log right now, your Git will start at HEAD, find myBranch, find B, show commit B, follow the arrow to commit A, and show commit A.


4Technically Git accomplishes this by writing the branch name master into a file in .git named .git/HEAD. You can look at this file, but when you want to update it, you should use the various Git tools, because under various conditions, Git might be using some other file. In particular, since Git 2.5, Git now has git worktree add, which adds a new index-and-work-tree pair. Each added work-tree has to get its own separate HEAD as well, so once you add some work-trees, the index isn't always .git/index any more and HEAD isn't always .git/HEAD any more.


Summary

Keep the following items in mind at all times:

  • Git is all about commits. Branch names—and other names, once you get to that point—just serve to find the commits.

  • Every commit has a unique hash ID, and except for some new unfinished features ("partial clones"), you always either have a full commit, or none of a commit.

  • Every commit links back to one or more predecessor or parent commits, except for special cases like the very first commit ever in some repository. These linkages—or chains of commits—form what people call branches (one of the several meanings of the word "branch").

  • To make a new commit, you need to update Git's index. When you first git checkout some commit you don't already have out, Git will fill in the index—and of course your work-tree—from that commit. You work with files in your work-tree, and Git works with its index.

  • The index and your work-tree aren't copied around: when you git clone, or git fetch, or git push, you will transfer commits. The index and work-tree don't matter here (well, there are some conditions for git push, in the other Git, that's receiving your git push).

  • Commits are frozen for all time (and mostly permanent—they're a bit hard to get rid of, even if you want to, sometimes). The copies of files in your index and work-tree are temporary.

  • Adding new commits updates your branch name(s). The branch name that gets updated is the one you've attached HEAD to.

  • In Git 2.23 or later, you can use git switch to pick where HEAD goes and/or create new branch names, and git restore to extract specific files from specific commits; in earlier versions of Git, both jobs are stuck into one git checkout command.

  • When you get to the point of using a second Git repository, remember that until you git push those commits to that other repository, your Git is the only one that has your new commits. That makes it easy (and OK) to "rewrite history" by replacing some commits with some new-and-improved versions (e.g., git rebase -i or git commit --amend). Once you have sent the commits elsewhere, you can still replace commits with new-and-improved versions, it's just the other Git now has the commits you sent earlier, so these things get harder—sometimes a lot harder.

Community
  • 1
  • 1
torek
  • 448,244
  • 59
  • 642
  • 775
  • Thanks for the long response. It's appreciated. But in Step 10 where I am back on the master, I don't see file1.txt. I thought, that when I when back to the master branch, I would see everything on the master branch. A remove of file1.txt has never been done or merged back to master, so any check out of Master should still see file1.txt. – SParker Mar 18 '20 at 17:54
  • Your step 10 is to run `ls`. That's not a Git command so it doesn't do anything Git-ish, it just shows you what is in your work-tree. As steps 8 and 9 did *nothing* to your work-tree, there's no change since then. – torek Mar 18 '20 at 18:53
  • The key feature (or, for you, bug) here is that `git checkout` attaches HEAD and updates the index **but** if there is no need to *modify* a copy of a file that is in the index, it **leaves the index and work-tree copies alone**. You can explore this oddity in depth [here](https://stackoverflow.com/q/22053757/1256452). Since `master` and `myBranch`, at this particular juncture, both point to the *same* commit, no files in the index need updating. You're free to change branch names all you like: Git will do *nothing* with its index and hence do nothing with your work-tree. – torek Mar 18 '20 at 18:57