The problem here is that as far as Git is concerned, deleting the line you deleted is the right answer.
Remember that Git's basic unit of storage is the commit. Each commit has:
- some data: a snapshot of all of your files; and
- some metadata: information about the commit. This includes who made it, when (date and time stamps), and why (your, or the commiter's, log message). The last and most important piece of metadata to Git, though, is the parent commit hash ID.
Every commit has a unique hash ID. This hash ID gets assigned to the commit the moment you make it. From then on, that hash ID is reserved to that commit. Only that commit can have that ID.1
Meanwhile, as we just noted, each commit gets to store, in its metadata, a hash ID. Technically, each commit can store as many hash IDs as Git wants, but they have to be hash IDs of commits that already exist.2 Most commits store exactly one other-commit hash ID: the parent (singular) of the commit. (Merge commits store two, which is what makes them merge commits, and the very first commit someone makes in a new, totally-empty repository, cannot have a parent—there is no earlier commit to refer back to—so it just doesn't.)
In your case, then, you may have had some earlier commits, or not. We'll just draw a graph that assumes that you did:
... <-F <-G <-H
The commit whose hash ID is H
(H
stands in for the real hash ID, which looks random) remembers the hash ID of its parent, previously-existing commit G
, which remembers the hash ID of its parent F
, and so on. These backwards-pointing arrows, embedded in the metadata of each commit, are how Git finds commits—except for commit H
itself, which is the last commit.
The way Git finds the last commit of any branch is that the branch name, such as master
, holds the commit's hash ID. So to make the drawing more complete, let's draw that in. Since nothing about any commit can ever change after we make it, we can be lazy and stop drawing those arrows as arrows, as long as we remember that they point backwards:
...--F--G--H <-- master
Now, let's make your new commit that adds this new file, file1.txt
. Commit H
doesn't have file1.txt
at all—it has some other files, but not file1.txt
. We git add file1.txt
and run git commit
and supply a log message. Git creates a new commit, which gets a new unique big ugly hash ID, but we'll just call it I
. Git sets the parent to H
so that I
points back to H
:
...--F--G--H <-- master
\
I
and then, as the last step of git commit
, Git writes I
's actual hash ID into the name master
:
...--F--G--H
\
I <-- master
(There's no reason to keep drawing I
on a separate line, so we won't.)
Now you edit the file and, with the usual process, make new commit J
. Commit J
has I
as its parent, and Git writes J
's hash ID into the name master
:
...--F--G--H--I--J <-- master
There is nothing to merge here, i.e., you can't use git merge
to do what you want. You have a linear chain of commits, ending at J
. From J
we go back to I
, from I
to H
, and so on.
1In a sense, the hash ID was reserved to that commit before you made it—except that the hash ID itself depends on the exact time at which you make it, down to the second. So if you'd made the commit one second earlier, or one second later, it would have had a different hash ID. In any case, the hash ID is unique: only that commit can have that hash ID.
If Git can't come up with a unique hash ID, it won't let you make the commit! This never actually happens, although it's a theoretical possibility. See also How does the newly found SHA-1 collision affect Git?
2The hash ID of the new commit we're about to create depends on the hash ID of its parent commit(s). So even if we figure out what hash ID the new commit being created will have if its parent is existing commit X, for any X, if we then insert this hash ID into it the commit's metadata before creating it, it gets a different hash ID after all. So it's not possible for the commit to refer to itself, and it's not allowed to just put some random junk in there. Therefore every commit always refers to some earlier commit.
To put it more briefly, given a commit, you can go backwards in time to its parent ... but you can only go backwards in time. You cannot go forwards to its future children.
As a consequence of this, you can't change any commit, nor remove any earlier commit without also removing all the later ones. (Git makes it especially hard to remove commits. Compare with Mercurial, where you run hg strip -r <rev>
and it removes that commit and all its children. You still don't get a choice about the children, but it's easy to take a commit away.)
Merging
What merging, in Git, is about generally occurs when we have more than one branch name. Let's rewind to the case where we have just commit H
as the last commit on master
. (We can use git reset --hard HEAD~2
to achieve this—that makes master
point directly to H
again, and also sets up the work areas—Git's index, and our work-tree where we can see files—to reflect commit H
again. I
and J
will continue to exist, and by default, can be retrieved for at least 30 more days. But we'll just pretend we never made I
and J
at all.) So we have this:
...--G--H <-- master
Now we'll create a new branch or two. When we do this, we need to add one more thing to our drawing. If there's only one branch name, master
, that's probably the branch we're using. But what if we have added dev
as a second name? Which name are we using?
Git's answer to this is to use the special name HEAD
. This special name is normally attached, to one of your branch names. (It can only attach to one or none: never more than one.) We'll add a second branch name, dev
, but leave HEAD
attached to master
:
...--G--H <-- master (HEAD), dev
Now we'll create new commits I
and J
in the usual way. Let's draw them in:
I--J <-- master (HEAD)
/
...--G--H <-- dev
Note that dev
has not moved: it still points to existing commit H
. The name master
now points to new commit J
.
Now, let's create two commits on dev
. We start by doing git checkout dev
. This attaches our HEAD to dev
, and also extracts the contents of commit H
to work with/on:
I--J <-- master
/
...--G--H <-- dev (HEAD)
The commits in the repository have not changed! But the files we see and work with have, and the current branch is dev
and the current commit is H
.3 Now we make two more new commits. Any number is allowed, but two makes the illustration easier:
I--J <-- master
/
...--G--H
\
K--L <-- dev (HEAD)
Now we can run git merge
. We pick one branch to use—we git checkout master
or git checkout dev
—and then we run git merge
and give it the name of the other branch.4 Let's git checkout master
and git merge dev
so that HEAD
, and the current commit, identify J
rather than L
:5
I--J <-- master (HEAD)
/
...--G--H
\
K--L <-- dev
Git now has to find the best commit that's on both branches. In this case, that's obvious: it is commit H
. We get there from J
by going back two steps, and we get there from L
by going back two steps. If the chain along the bottom were longer, we'd have to go back 3 or 4 or however many steps, but as long as we are able to get to commit H
, commit H
will be the best shared commit.
Git calls this shared, best commit, from which both we and they started, the merge base. The merge base commit is the key to merging. You—or Git—find it by looking at the graph, which shows how the commits connect.
Git will now run two git diff
operations:
git diff --find-renames hash-of-H
hash-of-J
, to find out what we changed, on master
, since shared commit H
; and
git diff --find-renames hash-of-H
hash-of-L
, to find out what they changed, on dev
, since shared commit H
.
What git merge
does is to combine these changes, then apply the combined changes to the snapshot in commit H
—the merge base. That way we get to keep our changes, and add their changes.
This is also why merges are mostly symmetric. If we had checked out dev
, i.e., commit L
, and run git merge master
, Git would still find common commit H
as the merge base. It would run the same two git diff
commands (in the other order but who cares?). It would then combine these differences into one big combined-set and apply those to the snapshot from commit H
. The result would be the same.
If our changes and their changes overlap in some way, Git will declare a merge conflict. In this case, Git won't finish the merge on its own. It will leave you with a mess that you must clean up by hand. That's OK: you just clean it up, git add
, and commit (or run git merge --continue
) to finish the job.
To finish the job, Git will make a new commit—we get to call it M
, for merge, since we cleverly labeled each of the previous commits H
through L
—and update the current branch name as usual, so that whichever branch we have checked out now ends at the new merge commit M
. To mark it as a merge commit, Git sets its two parents to J
and then L
, in that order because we were on J
when we started. So we can draw the result:
I--J
/ \
...--G--H M <-- master (HEAD)
\ /
K--L <-- dev
and we have our merge. The snapshot that goes with the merge is the result of applying to H
the combined changes from H
-vs-J
and those from H
-vs-L
. The parents of the merge are the previous commit as usual, and the other commit we picked out when we ran git merge dev
.
Now that this merge exists, attempting to merge L
, or even K
, into master
can't be done. The reason is that the best shared commit, between L
and M
, is commit L
... which is already part of the history of M
. If we step back from M
along the bottom row, we reach L
. History—which in Git consists of commits, including their connections—says that L
is already merged here.
3When you ask Git: What is in HEAD
? you have two ways to phrase this. You can ask Git: What branch name is in HEAD
? Or, you can ask: What commit does HEAD
select? The two different questions get two different answers. In "detached HEAD" mode, in which HEAD
is not attached to any branch name, the first one gets you an error instead of an answer. The second question almost always works.
Git also has the notion of an unborn branch, which it needs when you start out with a new, totally-empty repository with no commits at all. In this case HEAD
exists, and holds a branch name, but the branch name itself doesn't exist and is invalid. So in this particular situation, you can ask the "what name" question about HEAD, but not the "what ID" question: the reverse of the detached HEAD setup.
4In fact, git merge
works by commit hash IDs, so we can give it the hash ID of whichever commit we want. But usually we—humans—work by names.
5The merge result is generally the same each way, except for which parent is listed first. If we use particular flag arguments to git merge
, the merge result might be different, though.
Cherry-picking
There is something we can do though. Given any chain of commits at all—whether there's a fork like:
o--P--C--o--o <-- branch1
/
...--o--o
\
o--o--H <-- branch2 (HEAD)
or just a linear chain like:
...--o--o--P--C--o--o--H <-- branch (HEAD)
we can pick out some commit C
, a child whose parent is P
, and run git cherry-pick
on it. (Typically you would use C
's hash ID here.) What this does is force Git to:
- find commit
P
, C
's parent: this is easy because C
holds P
's hash ID inside it;
- treat
P
as a merge base, C
as "their" commit, and the current commit H
—as selected by HEAD
—as "our" commit, and do a full-blown three-way merge as usual.
So Git will now diff P
vs C
to see what "they" did, diff P
vs H
to see what we did, and combine these two sets of changes. Git will then apply the combined changes to the snapshot in P
. If all goes well, Git will commit the resulting files as a new snapshot C'
—a copy of commit C
—using C
's original commit message and so on. It won't make this a merge commit, but rather just an ordinary commit:
o--P--C--o--o <-- branch1
/
...--o--o
\
o--o--H--C' <-- branch2 (HEAD)
or:
...--o--o--P--C--o--o--H--C' <-- branch (HEAD)
It tends to make more sense to cherry-pick a commit from another branch, as in the top diagram; but you can cherry-pick a commit from your own history, to re-apply the same changes. This is particularly useful if some commit between C
and C'
was a commit that un-did whatever happened in C
.6
6Git has a command, git revert
, to make such commits. You point it at some child, and Git does the same three-way merge as for a cherry-pick, except that the merge base is C
this time, and the "theirs" commit is P
. (The ours / HEAD commit is the HEAD commit, as always.) Exercise: try getting the diff of C
vs P
, in that order. What would happen if you combined this set of changes with C
vs HEAD
, in that order?
Note that all of these operations are on entire commits
You started out wanting to fuss with one file. But everything Git has done here—or that we've shown Git doing—is based on entire commits. That's because the commit is really the fundamental unit in Git. It's true that commits store files, but Git isn't really about files. Git is about the commits. Files are merely what make commits useful.
You can extract individual files from individual commits, and work on and with them: for instance git diff
, given the names of two files, can diff just those two files. But that's an atypical way of working with Git. Git is geared towards commit-at-a-time operations.