0

TL;DR:

I made a series of commits, including a merge commit in the middle; now I would like to edit the text of a commit that precedes the merge, but otherwise preserve the committed code and the commit-graph.

Long story:

I wanted to rename commits I did locally so I used git rebase -i

I used r on the commits I want to rename and p on the ones I wanted to "preserve". The issue is the commits I wanted to "preserve" were someone's else commits (Before trying to rename the commit message, I previously did a merge from the main branch to my branch).

So basically I had something like this:

aaaa My Commit
bbbb My commit
cccc Someone's else commit
dddd My commit

What I did was I ran git rebase -i and did this:

r aaaa My Commit 1
r bbbb My Commit 2
p cccc Somene's else commit
r dddd My Commit 3

And now what I think it happened, those commits which were not mine were rewritten and now they look like new commits, they seem to have a different id than on the main branch. So on main branch the 4rd commit does not have cccc id:

xxxx Somene's else commit

So my questions are:

  1. Is my understanding correct? Are those new commits now? Or maybe I am completely misunderstanding it.
  2. Did I proceed wrong with modifying the commit messages? What is the correct way to do it?
  3. What now? How can I fix this mess?

I can search myself how to try undoing this but I'd like to understand what happened.

Don Box
  • 3,166
  • 3
  • 26
  • 55
  • 1
    If you rebase, you always get new commits (if a commit has a different parent, the commit is also different). – Dietrich Epp Sep 18 '17 at 16:53
  • @DietrichEpp Can I please get some thoughts\explanations to understand what happened? – Don Box Sep 18 '17 at 16:57
  • @DietrichEpp Marking it as just a duplicate doesn't help me. I know how to Google. I couldn't find what I am looking for. I've only seen simple example how to use rebase interactive to rename, but not if you have done merge previously – Don Box Sep 18 '17 at 16:57
  • @DietrichEpp Sorry if it looks too trivial, I am trying to get some help and understand – Don Box Sep 18 '17 at 16:58
  • 1
    The problem here is that you want to "fix" something, but you're also asking what the "correct" way to do something is. We don't have crystal balls, unless you describe exactly what you want all I can do is point you at resources so you can figure things out. – Dietrich Epp Sep 18 '17 at 17:25
  • But it sounds like you want to undo a git rebase, which is something that has been answered before on this site. – Dietrich Epp Sep 18 '17 at 17:27
  • @DietrichEpp Thanks. I didn't realize it wasn't clear. I used 'rebase -i' with `r` for editing my commits and `p` on someone's commits – Don Box Sep 18 '17 at 17:47
  • @DietrichEpp Which part is not clear? So I can edit it and explain it in a better way. Thanks – Don Box Sep 18 '17 at 17:47
  • @DietrichEpp I edited it, hope it makes more sense. If not please let me know, thanks – Don Box Sep 18 '17 at 17:52
  • 1
    What happened is you rebased somebody else's commit. Rebasing produces a linear history, and creates new commits *unless* the new commit would be identical to the old one (including having the same parrents). You're editing history here, and your history includes both private history (your presumably unpublished branch) and public history (the master branch). In general, you want to avoid editing public history. – Dietrich Epp Sep 18 '17 at 18:38
  • 1
    So what happened here was `git merge` followed by `git rebase`. What you wanted was either `git rebase` and *then* `git merge` (you can undo a merge with `git reset`), or to use `git merge` and then `git rebase --preserve-merges`. – Dietrich Epp Sep 18 '17 at 18:40
  • In any of these cases, the linked question about how to undo things is a good place to start, since it explains how to use the reflog. – Dietrich Epp Sep 18 '17 at 18:40
  • @DietrichEpp I will try to digest that info, thanks. But then, how can I edit the comments of commits without worrying about order of merge and rebase? – Don Box Sep 18 '17 at 18:51
  • @DietrichEpp What I'm asking, in my scenario, if I did a merge and then I realize I need to edit previous commits message, what is the correct way to handle this? – Don Box Sep 18 '17 at 18:53
  • This question has morphed somewhat substantially since it was first posted. It now seems that it should be phrased as: *I made a series of commits, including a merge commit in the middle; now I would like to edit the text of a commit that precedes the merge, but otherwise preserve the committed code and the commit-graph.* Is that accurate? – torek Sep 18 '17 at 19:03
  • @torek I think so, yes. Thank you very much for clearing that up. I am editing it now. Also, I need to know how to fix current situation. – Don Box Sep 18 '17 at 20:31
  • @torek I am still trying to understand what's going on though. When using rebase, does those commits (which were merged from main branch) become 'new' commits? – Don Box Sep 18 '17 at 20:33
  • Just a quick thought: For exactly those scenarios, I use [GitKraken](https://www.gitkraken.com/). It allows you to undo anything you did in Git (I don't work for Axosoft) – vatbub Sep 18 '17 at 21:01
  • @DonBox: "When using rebase, does those commits (which were merged from main branch) become 'new' commits?" I thought that I had answered this question (for the record, *yes* you get new commits) twice already, and you also include the answer in the body of the question (4rd commit does not have cccc id). So the fact that you are asking the question again makes me wonder what is missing from my answer: **Yes, you get new commits when you rebase.** – Dietrich Epp Sep 18 '17 at 22:16

1 Answers1

2

There are a bunch of somewhat tricky concepts all rolled into one tightly coiled ball of hair here. Let's tease them apart, starting with the "true name" of a commit. Each commit has just one of these,1 and that is its hash ID, which is one of those big ugly 40-character things like 238e487ea943f80734cc6dad665e7238b8cbc7ff.


1Git's eventual transition from SHA-1 to something with more bits in it may result in invalidating this: commits will, at least temporarily, have two true names, which becomes awkward in the unlikely-but-necessarily-possible event that one of these new bigger-hash commits gets a collision in its smaller SHA-1 hash. But let's not worry about that here. :-)


Hash IDs are unique

Given a hash ID, Git can find the commit (or other object) and extract its contents. Given some contents, Git can compute the hash ID. So there's a one-to-one mapping between these: a hash key represents exactly one value, and that one particular value is always represented by that same single hash key. This is what allows Git to transfer commits (and other objects) between repositories via git fetch and git push.

A commit's hash ID includes the author and message and a time stamp

Let's look at one of these commits:

$ git cat-file -p HEAD | sed 's/@/ /'
tree e97e9653eed972b4521e7f562e40f61f74eeb76c
parent 6e6ba65a7c8f8f9556ec42678f661794d47f7f98
author Junio C Hamano <gitster pobox.com> 1503813601 -0700
committer Junio C Hamano <gitster pobox.com> 1503813601 -0700

The fifth batch post 2.14

Signed-off-by: Junio C Hamano <gitster pobox.com>

This is the entire contents of commit 238e487ea943f80734cc6dad665e7238b8cbc7ff, and computing an SHA-1 checksum of commit 293\0 (293 is the length of the text) plus the original text results in that hash:

$ python
...
>>> import hashlib
>>> import subprocess
>>> p = subprocess.Popen('git cat-file -p HEAD', stdout=subprocess.PIPE, shell=True)
>>> text = p.stdout.read()
>>> len(text)
293
>>> s = 'commit {}\0'.format(len(text)).encode('utf8')
>>> s += text
>>> hashlib.sha1(s).hexdigest()
'238e487ea943f80734cc6dad665e7238b8cbc7ff'

(the above should work in py2k and py3k but was patched up slightly on the fly, so might have a glitch).

Anyway, note in particular the parent line and the author and committer lines. The parent line gives the hash ID of the parent of this commit. The other two lines have a name, an email address, a long decimal number, and a weird -0700 that is actually a time zone offset (7 hours west of GMT/Zulu time, in this case). The big decimal number plus this time zone offset is the time stamp of the commit.

The tree line gives the Git hash ID of the tree object that contains the source that goes with this commit. The rest of the text is, obviously, just the commit message itself. Having time stamps means that two otherwise identical commits, made by the same person, using the same source tree and same commit message, will generally result in two different commits because no one makes more than one commit per second.2


2Scripts can easily violate this rule and can produce surprises.


Branch names simply point to commits, as do other commits

Since each commit has, as part of its core data, the hash ID of its parent commit, it suffices to store a single Git hash ID in a branch name like master or develop. This name maps to the hash ID, which identifies or "points to" the tip commit of the branch. That particular commit then has inside it the hash ID of its parent commit: the tip commit points to its parent. That parent commit points back to its own parent. It's this chain of backwards pointers, starting from branch tip commits as identified by branch names, that make up a Git branch:

A <-B <-C   <-- master

Here, in this tiny 3-commit repository, the name master identifies commit C; C points back to B; and B points back to A. Since A is the very first commit ever made, it points nowhere at all. The technical term for this is a root commit, and when we (or Git) work with commits, we generally follow the backwards pointers until they run out at the root.

All of this means that no commit (nor any Git object) can ever change

We're given the claim that the hash ID of any Git object—commit, tree, annotated tag, or "blob" (file)—is unique, and that it strictly depends on the data inside the object. This claim is true; Git enforces it by refusing to add a new object that, by some chance or wicked purpose, has the same hash as some existing object. In practice, changing or adding or removing just one character inside a commit produces a whole new, different hash; and even just copying a commit tends to produce a new, different hash due to the time stamps.

This makes rebase impossible, in one sense. And yet, git rebase exists, so it must be possible somehow. The trick lies in the how.

The purpose of rebasing

There are several reasons one might use git rebase, but the most common is simply to do just that: "re-base" some commit(s). Let's draw another graph like the minimal repository, but add a branch:

A--B--C   <-- master
       \
        D--E   <-- develop

The arrows inside these commits all point backwards (by definition) and ASCII makes it hard to draw in the individual arrows well, so I've left them out here. But let's continue to emphasize that the name master points to commit C, and the name develop points to commit E, because we're about to make a new commit on master:

A--B--C--F   <-- master
       \
        D--E   <-- develop

Now we have a situation ripe for doing git rebase: we might like to have commits D and E come after commit F.

We've already seen, though, that we can't change anything about a commit. If we try, we get a new, different commit. But let's do that anyway: let's copy commit D to a new, different commit D', whose parent is commit F and whose message is the same as D's:

           D'  <-- [temporary]
          /
A--B--C--F   <-- master
       \
        D--E   <-- develop

To make this really work, we'll start with F's source tree too, and make whatever changes we made earlier, to that tree. We'll do this by having Git compare commit D to its parent commit C:

git diff develop^ develop

then apply that set of changes to commit F, and then make this new copy D' using git commit with the same message as the original D.

There is a Git command that does this kind of copying: git cherry-pick. If we check out commit F by its hash ID (as a detached HEAD), and cherry-pick commit D, we get commit D'. What changes are the tree and the parent lines, and almost certainly the time stamp. But commit D' is "just as good" as commit D, or maybe even better, if we just also copy commit E to E':

           D'--E'  <-- HEAD
          /
A--B--C--F   <-- master
       \
        D--E   <-- develop

Now that we've copied the two commits we care about, we can tell Git to rip the label develop away from commit E and make it point, instead, to our last copy, E':

           D'--E'  <-- develop
          /
A--B--C--F   <-- master
       \
        D--E   <-- [abandoned]

This is what git rebase does, in general: it's an automated series of git cherry-pick copy operations, followed by a label-move.

Choosing what to copy, to where, and other refinements

There's a very tricky bit here, disguised by the way we've been drawing these commit graphs. How does Git know which commits to copy, and where to put the copies?

The usual answer, in Git, is taken from the (single) argument to git rebase. If we run git rebase master, we are telling Git:

  • copy commits that are on the current branch (develop) and not on master;
  • copy them to the point that comes after the tip of master.

If you look at the graph, it's obvious that the commits that are on develop are D-E. But this is wrong! The commits that are on develop are actually A-B-C-D-E. The commits that are on master are A-B-C-F. Three of these commits, A-B-C, are on both branches.

This is why the phrase above is "commits that are on the current branch, and not on the other one." Since A-B-C are on both, that knocks them out of the list, leaving just D-E to be copied.

Note that our single argument, master, is used both as "what not to copy" and "where to copy". The rebase command has a way to split these apart—"don't copy based on commit S-for-stop" and "put the copies after T-for-target"—but you still only get one "stop" point. The default is that you name both S and T with one name. The --onto flag, git rebase --onto T S, is what lets you split them up.

Besides just copying commits, you can use a special variety of rebase—the "interactive" one—to let you make changes just before3 it makes the new copy of an existing commit. That is, you can think of this as Copy commit D as if via cherry-pick, but let me make some minor changes just before committing the new D'.


3In fact, these changes are usually made using git commit --amend, which means that you wind up making two copies: one in the new place, and then the amended copy, shoving the first copy aside, to really use. But this all happens behind the scenes and is more efficient than it sounds anyway, so it doesn't really hurt to just pretend it's "just before", at least for learning purposes.


Merges make everything trickier

Now let's look at merges. A merge commit—this is an actual thing, separate from the process by which we make the merge commit, but both are called "merge"—is any commit with at least two parent commits. We draw them by having the merge "point back" to each of its parents:

...--H--I--J---M   <-- br1
         \    /
          K--L   <-- br2

Here merge commit M has two parents, J and L. We probably made it by doing git checkout br1; git merge br2. (This means that M's first parent is J. This does not matter right here, but it's useful later on. The first parent of any merge is the commit that was HEAD at the time you ran git merge. This often does not get drawn in graphs, which don't generally care about the order. Git mostly doesn't care either, except for this first-vs-second thing, and then only if you use --first-parent.)

Let's add a few more commits beyond M, all on br1 (which will be our current branch; let's label that too, by adding (HEAD)):

...--H--I--J---M--N--O   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

Now let's imagine we are trying to use git rebase to copy, say, J-M-N-O.

We can tell Git to stop copying at (and before) L. But then the copies go at the wrong place, i.e., just after L.

We can tell Git to stop copying at (and before) I. But then Git insists on copying K and L.

The merge, in other words, throws a monkey wrench into the idea of using just one "stop point" unless we pick I; and then we copy someone else' commits.

It also adds one really big monkey wrench: Git cannot copy a merge. The cherry-pick command insists that you pick one "side" of the merge, and copies the commit into a new non-merge commit that does what that "side" did, rather than actually merging. Worse, the rebase command, by default, simply skips merges entirely!

Here's where things get particularly tricky. Git will sometimes re-use an existing commit in place, especially doing an interactive rebase; and git rebase -p claims to attempt to preserve merges—which it doesn't, really, because it can't. But it will re-perform a merge, i.e., run git merge again.

Hence, given the above graph, we can try running:

git rebase -i -p <hash-of-I>

Git will, we hope, re-use K and L in place, and maybe even re-use J as well if we don't propose to change it at all. Of course, we do intend to change J (by using reword or edit on it). So now Git will copy J, let us tweak J', and then re-run the merge command to make a new merge, M', between J' and L, which we hope it re-used in place.

Git will then have to go on to copy N and O. The new M' has a different hash ID than the original M, so even if N itself needs no other changes, its parent line has to change. Since N changed to become N', O likewise must change to become O' pointing back to N'.

Whether all of this works depends on whether Git preserves the original K and L commits. If Git chooses to copy them, you'll become the committer (the author generally stays the same) and the time stamps will change, and hence you will copy K and L to K' and L'. The existing branch will continue to point to the originals, not to the copies.

If the copying is too complicated for Git, you can do it manually

Suppose that, for whatever reason, git rebase -i -p <hash-of-I> does not do what we want. We undo the rebase immediately afterward using git reset --hard ORIG_HEAD or similar, so that we are back to this graph:

...--H--I--J---M--N--O   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

We now wish to make a new commit J' that is like J but different, so we can do this manually. Everything is all clean—there are no changes to worry about staging or whatever at this point—so we just run:

$ git checkout -b newbr1 <hash-of-I>
$ git cherry-pick -n <hash-of-J>

The -n (or --no-commit) tells Git that, yes, we're copying J here, but don't commit the copy just yet. Now we can fiddle as much as we like with commit contents (edit files and git add them), and then run git commit to make the new commit and edit the commit message. (If you don't need to change the tree any, you can leave out the -n and just edit the message.)

Now we have this:

          J'   <-- newbr1 (HEAD)
         /
...--H--I--J---M--N--O   <-- br1
         \    /
          K--L   <-- br2

We're now ready to merge commit L:

$ git merge br2

This produces commit M'. We're now ready to cherry-pick N:

$ git cherry-pick -n <hash-of-N>

which we can tweak as much as we like, and:

$ git cherry-pick -n br1

to copy O (we don't need to know or find its hash, because the name br1 points to O).

Once we're all done we just have to force the name br1 to point to the new O' copy we made, for which we can use any of several Git commands, such as:

git branch -f br1 newbr1

as long as we're still on branch newbr1.

torek
  • 448,244
  • 59
  • 642
  • 775