3

I’ve re-created a problem that I’ve come across a few times. This is the basic situation:

Create a file, file1.txt, with the following contents:

Hello,
Welcome to my file.
Goodbye.

$ git add file1.txt
$ git commit -m “Initial commit”

Add a second body line to file1.txt. **Note: 'Accidentally' delete “Welcome to my file” when adding this line.

Hello,
This is the second line.
Goodbye.

$ git add file1.txt
$ git commit -m “Added second line”


$ git log
commit ccd8.. (HEAD -> master)
Author: ____
Date:   Tue Jan 28 11:50:11 2020 -0800

Added second line

commit 6d83..
Author: ___
Date:   Tue Jan 28 11:49:36 2020 -0800

Initial commit

What is the best way to merge these two commits? The goal is to have file, file1.txt, with contents:

Hello,
Welcome to my file.
This is the second line.
Goodbye.

What I tried so far was:

$ git checkout 6d83..
$ git branch tmp
$ git checkout master
$ git merge tmp

But I get the message “Already up to date.” Is git rebase the best thing to do here? Why does creating a temporary branch and then merging not work?

meinna
  • 31
  • 2
  • Not exactly the same as your situation but [this](https://devblogs.microsoft.com/oldnewthing/20190514-00/?p=102493) might help. It's the same problem but the issue is spread across two files. – Ashhar Hasan Jan 28 '20 at 22:28
  • Also, the up to date message that you get is since git tracks commits, not actual content. So since both the master and temp branches contain exactly the same commits it can figure that nothing new has actually happened. – Ashhar Hasan Jan 28 '20 at 22:31
  • And the reason why you can't do what you want with git is that git commits are designed to be immutable. ie. if the content of a commit changes the commit id MUST change. Due to that rule there's no way you can change existing data without creating a new commit. The simplest way to do what you want is to `rebase`. – Ashhar Hasan Jan 28 '20 at 22:32

4 Answers4

5

The problem here is that as far as Git is concerned, deleting the line you deleted is the right answer.

Remember that Git's basic unit of storage is the commit. Each commit has:

  • some data: a snapshot of all of your files; and
  • some metadata: information about the commit. This includes who made it, when (date and time stamps), and why (your, or the commiter's, log message). The last and most important piece of metadata to Git, though, is the parent commit hash ID.

Every commit has a unique hash ID. This hash ID gets assigned to the commit the moment you make it. From then on, that hash ID is reserved to that commit. Only that commit can have that ID.1

Meanwhile, as we just noted, each commit gets to store, in its metadata, a hash ID. Technically, each commit can store as many hash IDs as Git wants, but they have to be hash IDs of commits that already exist.2 Most commits store exactly one other-commit hash ID: the parent (singular) of the commit. (Merge commits store two, which is what makes them merge commits, and the very first commit someone makes in a new, totally-empty repository, cannot have a parent—there is no earlier commit to refer back to—so it just doesn't.)

In your case, then, you may have had some earlier commits, or not. We'll just draw a graph that assumes that you did:

... <-F <-G <-H

The commit whose hash ID is H (H stands in for the real hash ID, which looks random) remembers the hash ID of its parent, previously-existing commit G, which remembers the hash ID of its parent F, and so on. These backwards-pointing arrows, embedded in the metadata of each commit, are how Git finds commits—except for commit H itself, which is the last commit.

The way Git finds the last commit of any branch is that the branch name, such as master, holds the commit's hash ID. So to make the drawing more complete, let's draw that in. Since nothing about any commit can ever change after we make it, we can be lazy and stop drawing those arrows as arrows, as long as we remember that they point backwards:

...--F--G--H   <-- master

Now, let's make your new commit that adds this new file, file1.txt. Commit H doesn't have file1.txt at all—it has some other files, but not file1.txt. We git add file1.txt and run git commit and supply a log message. Git creates a new commit, which gets a new unique big ugly hash ID, but we'll just call it I. Git sets the parent to H so that I points back to H:

...--F--G--H   <-- master
            \
             I

and then, as the last step of git commit, Git writes I's actual hash ID into the name master:

...--F--G--H
            \
             I   <-- master

(There's no reason to keep drawing I on a separate line, so we won't.)

Now you edit the file and, with the usual process, make new commit J. Commit J has I as its parent, and Git writes J's hash ID into the name master:

...--F--G--H--I--J   <-- master

There is nothing to merge here, i.e., you can't use git merge to do what you want. You have a linear chain of commits, ending at J. From J we go back to I, from I to H, and so on.


1In a sense, the hash ID was reserved to that commit before you made it—except that the hash ID itself depends on the exact time at which you make it, down to the second. So if you'd made the commit one second earlier, or one second later, it would have had a different hash ID. In any case, the hash ID is unique: only that commit can have that hash ID.

If Git can't come up with a unique hash ID, it won't let you make the commit! This never actually happens, although it's a theoretical possibility. See also How does the newly found SHA-1 collision affect Git?

2The hash ID of the new commit we're about to create depends on the hash ID of its parent commit(s). So even if we figure out what hash ID the new commit being created will have if its parent is existing commit X, for any X, if we then insert this hash ID into it the commit's metadata before creating it, it gets a different hash ID after all. So it's not possible for the commit to refer to itself, and it's not allowed to just put some random junk in there. Therefore every commit always refers to some earlier commit.

To put it more briefly, given a commit, you can go backwards in time to its parent ... but you can only go backwards in time. You cannot go forwards to its future children.

As a consequence of this, you can't change any commit, nor remove any earlier commit without also removing all the later ones. (Git makes it especially hard to remove commits. Compare with Mercurial, where you run hg strip -r <rev> and it removes that commit and all its children. You still don't get a choice about the children, but it's easy to take a commit away.)


Merging

What merging, in Git, is about generally occurs when we have more than one branch name. Let's rewind to the case where we have just commit H as the last commit on master. (We can use git reset --hard HEAD~2 to achieve this—that makes master point directly to H again, and also sets up the work areas—Git's index, and our work-tree where we can see files—to reflect commit H again. I and J will continue to exist, and by default, can be retrieved for at least 30 more days. But we'll just pretend we never made I and J at all.) So we have this:

...--G--H   <-- master

Now we'll create a new branch or two. When we do this, we need to add one more thing to our drawing. If there's only one branch name, master, that's probably the branch we're using. But what if we have added dev as a second name? Which name are we using?

Git's answer to this is to use the special name HEAD. This special name is normally attached, to one of your branch names. (It can only attach to one or none: never more than one.) We'll add a second branch name, dev, but leave HEAD attached to master:

...--G--H   <-- master (HEAD), dev

Now we'll create new commits I and J in the usual way. Let's draw them in:

          I--J   <-- master (HEAD)
         /
...--G--H   <-- dev

Note that dev has not moved: it still points to existing commit H. The name master now points to new commit J.

Now, let's create two commits on dev. We start by doing git checkout dev. This attaches our HEAD to dev, and also extracts the contents of commit H to work with/on:

          I--J   <-- master
         /
...--G--H   <-- dev (HEAD)

The commits in the repository have not changed! But the files we see and work with have, and the current branch is dev and the current commit is H.3 Now we make two more new commits. Any number is allowed, but two makes the illustration easier:

          I--J   <-- master
         /
...--G--H
         \
          K--L   <-- dev (HEAD)

Now we can run git merge. We pick one branch to use—we git checkout master or git checkout dev—and then we run git merge and give it the name of the other branch.4 Let's git checkout master and git merge dev so that HEAD, and the current commit, identify J rather than L:5

          I--J   <-- master (HEAD)
         /
...--G--H
         \
          K--L   <-- dev

Git now has to find the best commit that's on both branches. In this case, that's obvious: it is commit H. We get there from J by going back two steps, and we get there from L by going back two steps. If the chain along the bottom were longer, we'd have to go back 3 or 4 or however many steps, but as long as we are able to get to commit H, commit H will be the best shared commit.

Git calls this shared, best commit, from which both we and they started, the merge base. The merge base commit is the key to merging. You—or Git—find it by looking at the graph, which shows how the commits connect.

Git will now run two git diff operations:

  • git diff --find-renames hash-of-H hash-of-J, to find out what we changed, on master, since shared commit H; and
  • git diff --find-renames hash-of-H hash-of-L, to find out what they changed, on dev, since shared commit H.

What git merge does is to combine these changes, then apply the combined changes to the snapshot in commit H—the merge base. That way we get to keep our changes, and add their changes.

This is also why merges are mostly symmetric. If we had checked out dev, i.e., commit L, and run git merge master, Git would still find common commit H as the merge base. It would run the same two git diff commands (in the other order but who cares?). It would then combine these differences into one big combined-set and apply those to the snapshot from commit H. The result would be the same.

If our changes and their changes overlap in some way, Git will declare a merge conflict. In this case, Git won't finish the merge on its own. It will leave you with a mess that you must clean up by hand. That's OK: you just clean it up, git add, and commit (or run git merge --continue) to finish the job.

To finish the job, Git will make a new commit—we get to call it M, for merge, since we cleverly labeled each of the previous commits H through L—and update the current branch name as usual, so that whichever branch we have checked out now ends at the new merge commit M. To mark it as a merge commit, Git sets its two parents to J and then L, in that order because we were on J when we started. So we can draw the result:

          I--J
         /    \
...--G--H      M   <-- master (HEAD)
         \    /
          K--L   <-- dev

and we have our merge. The snapshot that goes with the merge is the result of applying to H the combined changes from H-vs-J and those from H-vs-L. The parents of the merge are the previous commit as usual, and the other commit we picked out when we ran git merge dev.

Now that this merge exists, attempting to merge L, or even K, into master can't be done. The reason is that the best shared commit, between L and M, is commit L ... which is already part of the history of M. If we step back from M along the bottom row, we reach L. History—which in Git consists of commits, including their connections—says that L is already merged here.


3When you ask Git: What is in HEAD? you have two ways to phrase this. You can ask Git: What branch name is in HEAD? Or, you can ask: What commit does HEAD select? The two different questions get two different answers. In "detached HEAD" mode, in which HEAD is not attached to any branch name, the first one gets you an error instead of an answer. The second question almost always works.

Git also has the notion of an unborn branch, which it needs when you start out with a new, totally-empty repository with no commits at all. In this case HEAD exists, and holds a branch name, but the branch name itself doesn't exist and is invalid. So in this particular situation, you can ask the "what name" question about HEAD, but not the "what ID" question: the reverse of the detached HEAD setup.

4In fact, git merge works by commit hash IDs, so we can give it the hash ID of whichever commit we want. But usually we—humans—work by names.

5The merge result is generally the same each way, except for which parent is listed first. If we use particular flag arguments to git merge, the merge result might be different, though.


Cherry-picking

There is something we can do though. Given any chain of commits at all—whether there's a fork like:

          o--P--C--o--o   <-- branch1
         /
...--o--o
         \
          o--o--H   <-- branch2 (HEAD)

or just a linear chain like:

...--o--o--P--C--o--o--H   <-- branch (HEAD)

we can pick out some commit C, a child whose parent is P, and run git cherry-pick on it. (Typically you would use C's hash ID here.) What this does is force Git to:

  • find commit P, C's parent: this is easy because C holds P's hash ID inside it;
  • treat P as a merge base, C as "their" commit, and the current commit H—as selected by HEAD—as "our" commit, and do a full-blown three-way merge as usual.

So Git will now diff P vs C to see what "they" did, diff P vs H to see what we did, and combine these two sets of changes. Git will then apply the combined changes to the snapshot in P. If all goes well, Git will commit the resulting files as a new snapshot C'—a copy of commit C—using C's original commit message and so on. It won't make this a merge commit, but rather just an ordinary commit:

          o--P--C--o--o   <-- branch1
         /
...--o--o
         \
          o--o--H--C'  <-- branch2 (HEAD)

or:

...--o--o--P--C--o--o--H--C'  <-- branch (HEAD)

It tends to make more sense to cherry-pick a commit from another branch, as in the top diagram; but you can cherry-pick a commit from your own history, to re-apply the same changes. This is particularly useful if some commit between C and C' was a commit that un-did whatever happened in C.6


6Git has a command, git revert, to make such commits. You point it at some child, and Git does the same three-way merge as for a cherry-pick, except that the merge base is C this time, and the "theirs" commit is P. (The ours / HEAD commit is the HEAD commit, as always.) Exercise: try getting the diff of C vs P, in that order. What would happen if you combined this set of changes with C vs HEAD, in that order?


Note that all of these operations are on entire commits

You started out wanting to fuss with one file. But everything Git has done here—or that we've shown Git doing—is based on entire commits. That's because the commit is really the fundamental unit in Git. It's true that commits store files, but Git isn't really about files. Git is about the commits. Files are merely what make commits useful.

You can extract individual files from individual commits, and work on and with them: for instance git diff, given the names of two files, can diff just those two files. But that's an atypical way of working with Git. Git is geared towards commit-at-a-time operations.

torek
  • 448,244
  • 59
  • 642
  • 775
1

You cannot do it autmagically with a merge as tried above. But assuming you have correctly configured your favorite diff editor, this will let you fix the file manually accessing the previous content before committing it. On your master branch:

git difftool ccd8:file1.txt file1.txt

Once correctly fixed and saved, after exiting your editor

git add file1.txt

If you did not yet push, you can amend the previous commit

git commit --amend

Or create a brand new one with the recovered line

git commit -m "Recovered line"
Zeitounator
  • 38,476
  • 7
  • 53
  • 66
0

There is no way to do what you're trying to do in git. It would cause a merge conflict if anything.

JoelFan
  • 37,465
  • 35
  • 132
  • 205
  • I'd be really in the why behind it (if you do know it). Is it technically impossible due to how git tracks content? Or it's just that there's no existing tooling around it? – Ashhar Hasan Jan 28 '20 at 22:27
  • I 'want' a merge conflict so that then I could accept both lines. However currently, it is not even allowing a merge. – meinna Jan 29 '20 at 00:56
  • What's wrong with merge conflicts? They're just changes that need human judgement to apply correctly. – jthill Jan 31 '20 at 15:34
  • In this case, the whole merge is a conflict. There's no point in it. – JoelFan Jan 31 '20 at 19:28
0

One easy way to do it is git checkout -p @^ file.txt, that'll find every difference between your work tree version and the grandparent's version and offer to apply it, and you can edit the offered hunks.

Cherrypick is generally just shorthand for diff | apply -3, if you want to get all of @^'s changes back you can also try git diff @^!|git apply -3, this might leave you with some conflicts to resolve, but don't be even a little afraid of those, they're rare but normal. Practice with a good merge/diff tool. I like vimdiff, resolving trivial conflicts is freaking fast. Something like fighting over a new flag bit or something generally takes seconds to resolve.

jthill
  • 55,082
  • 5
  • 77
  • 137