-1

So even after doing it plenty of time, I am still very scared of rebasing, and I think one the problem I have is I do deeply understand what it does.

So I have branch develop and my branch, starting out of develop. To avoid/solve conflict, I wish to update from which commit is my branch starting. For that reason, on my branch I perform a git rebase develop.

My question is, Let say , during the rebasing phase, I decide to delete/modify every single changes performed. Once I push, will only my branch will be modified, or my rebase did also modified the actual commit from develop ?

Djoby
  • 602
  • 1
  • 6
  • 22
  • 1
    First of all, don't be scared about git. If you don't have uncommited changes, there is [git reflog](https://git-scm.com/docs/git-reflog). `git rebase develop` basically means: "rebase my current branch on develop" - only your current branch is going to be changed – kapsiR Dec 07 '21 at 21:29
  • "my rebase did also modified the actual commit from develop" You cannot modify _any_ commits in Git. Rebasing makes _new_ commits. – matt Dec 07 '21 at 22:46
  • Rebasing can involve rewriting whatever commits _you_ specify. You do not have to guess. You are the one giving the orders! You might read https://stackoverflow.com/a/68636306/341994 for more info. – matt Dec 07 '21 at 23:21

2 Answers2

2

Only the branch you're currently on will be modified. The branch you're rebasing onto will not be touched: it's only used as a starting point for the work.

If you want to understand git more in depth, I recommend reading A Hacker's Guide to Git, which is a really good article that goes into depth. It really improved my understanding of how git works, and what it does. Shown below is an exerpt of the article:

enter image description here

Alecto Irene Perez
  • 10,321
  • 23
  • 46
0

As kapsiR commented, have no fear. Well, maybe have a little, just enough to take a few precautions, like running git status often.

What git rebase is really about is copying (some) commits to new-and-improved commits. To understand how and why this works, you first need to understand commits:

  • What is a commit?
  • How do we find commits?
  • How do we make new commits?

These are the first few crucial questions. When you know the answers, you will understand Git itself. Then git rebase just requires one more item:

  • How does Git copy a commit?

(although there will always be more to learn).

What, exactly, is a commit (and why do we care)

A commit in Git:

  • Is numbered. Every commit has a unique hash ID. This is a very large, impossible-to-remember number expressed in hexadecimal. Whenever you make a new commit, that commit gets a unique number. By unique I don't mean "maybe unique" or "sort of unique", but rather absolutely unique. No other Git anywhere in the universe is allowed to have or use that number now, unless you give that other Git this commit that you just made. Then they'll use this number for this commit and both repositories will have the same commit.

    This means that the commit's number–its hash ID–is the commit, in a sense. When you connect two Git versions to each other, to exchange commits from one repository to another, they just look at the numbers. If they have the same number, they have the same thing. If not, whoever's missing the number can get the thing from the other Git, and now they have the same thing with the same number.

  • Stores two things:

    • Each commit stores a full snapshot of every file. The files in the commit are stored in a special, compressed and de-duplicated, Git-only form: only Git can actually read these things, and nothing—not even Git itself—can overwrite them. The fact that they can't change allows the de-duplication, and keeps the repository from getting tremendously fat quickly, even though most commits mostly re-use most of the files from some previous commit(s).

    • Each commit also stores some metadata, or information about the commit. This includes the name and email address of the person who made the commit, for instance. It includes some date-and-time stamps. It includes your log message, so you can look back later and remember why you made the commit, or read someone else's text explaining why they made their commit. (Good commit messages are important.)

    There's a part of this metadata that Git uses for its own purposes, that is crucial here. When you make a new commit, Git stores, in that commit's metadata, the hash ID of some previous commit or commits. Most commits get exactly one hash ID stored here; we'll see more about this in a moment.

  • Is read-only: no part of any Git commit can ever change.

    In fact, Git's magic hash technique means that no internal object of any kind can ever change. This provides the read-only nature of commits, the read-only nature of the de-duplicated files, and the ability to de-duplicate files.

Why we care about this is simple. Git is all about the commits. Git isn't about files or branches (branch names, that is). It's about the commits. The commits are the currency of exchange between Git repositories. We connect one Git to another, with git fetch or git push,1 and we transfer commits from one Git to another.

A Git repository is, to a first extent, just a big database of commits. The commits hold files and metadata. Our own personal goals may well have to do with files, but Git doesn't deal in files at this level. It deals only in commits. So we have to use commits to get anything done with files.


1The git pull command is a kind of convenience wrapper that first runs git fetch, to get new commits from somewhere, then runs a second Git command to do something with the commits just fetched. This is a bit of a trap: it leads people to believe that git pull is the right command to use, when actually the two separate commands are just as often right, especially because you can then squeeze a command between them. That is, you can get new commits and look at them before you decide how, or even if, you want to incorporate them. When you use git pull you don't get this choice.


How we find commits

I mentioned above that each commit stores the hash ID of a previous commit. Or, more precisely, each commit has a list of previous commits: the list can be empty ("I have no parent, I am an orphan"), or just one entry long ("my dad/mom is ________"—fill in the blank with a hash ID), or two or more entries long ("I am a merge! My parents are ________ and ________"—fill in the blanks again).

Note that no parent knows its children. When the commit is "born", it knows who its parent is, but from that point on it's frozen for all time. It cannot learn the "names" (hash IDs) of its children. So the history, in a repository, works backwards.

We find commits by starting from a later commit. Each later points backwards to its parent. As long as we have no merge commits, we have a simple linear chain, like this:

... <-F <-G <-H

Here H stands in for the real (big and ugly) hash ID of the last commit in the chain. We have to know its hash ID somehow—and we'll come back to this in a moment—but assuming that we do know its hash ID, that's sufficient for Git to retrieve the commit from its big database of all the commits.

Having retrieved H, Git now has a snapshot—all the files that go with H—and some metadata. The metadata include the hash ID of earlier commit G. So Git can now reach back into its database and pull commit G out, and now it has another snapshot and more metadata.

By comparing the files in the two snapshots, G and H, Git can tell us which files did change, and which ones did not. (This goes pretty fast, too, because of the de-duplication. Files that aren't changed are shared, i.e., the two commits refer to the same underlying file.) Git can look more closely at the files that did change, and show us what changed in those files. That's how we normally view a commit: as changes since the previous commit. But Git doesn't store changes; it stores whole files, as a snapshot.

Having shown us H's metadata and changes, Git can now step back one hop to commit G (whose hash ID it already retrieved). This of course is a commit, with snapshot and metadata. Its metadata refers back to earlier commit F. So Git can now repeat what it just did with H, to show us commit G.

Having shown commit G, Git can now move back one hop to commit F. It can show us F, and then move back one hop again. This continues until we get to the very first commit. That first commit is special in exactly one way: it has no parent. Its list of "previous commits" is empty. This is how Git knows when to stop going backwards. (In a big repository, you'll probably quit out of git log long before Git gets all the way back to the start, but that's fine too.)

There's one big problem here though. How did we find commit H? We said above that we just assume we have the hash ID saved somewhere. Maybe we wrote it down on a scrap of paper, or on the office whiteboard, or whatever. But here's a better idea: we have a computer, running software with a database of commits. Let's have a database of latest hash IDs too. We can call these branch names.

Branch names find commits and are updated by new commits

A branch name like master or main, develop, feature, and so on simply holds one commit hash ID. The one hash ID stored in the branch name is the hash ID of the last commit in the chain. So if we have:

...--F--G--H   <-- main

then commit H is, by definition, the latest commit on branch main.

We can make more branch names. Let's add the name develop, also pointing to commit H:

...--F--G--H   <-- develop, main

We need some way, now, to know which name we're using—although whichever name we're using, we'll be working with commit H—so let's add the special all-caps name HEAD:

...--F--G--H   <-- develop, main (HEAD)

We're currently on branch main, because HEAD is attached to (or next to) main. If we git checkout develop or git switch develop, we get:

...--F--G--H   <-- develop (HEAD), main

We're still using commit H, so nothing else has changed, but we're using it through the name develop.

When we make a new commit—I'll skip entirely over most of the details of Git's index aka staging area here—we run git commit and Git:

  • gathers metadata, such as your name and email address and a log message;
  • uses HEAD to find the current commit's hash ID H;
  • writes out a snapshot of all the files (de-duplicated) to go into the new commit;
  • writes out the metadata, creating the new commit itself and getting a new unique hash ID; and
  • writes the hash ID into the name to which HEAD is attached.

So now that we've made a new commit I, new commit I points back to existing commit H. And because HEAD is attached to develop, the name develop now points to the new commit I. The name main still points to commit H:

...--F--G--H   <-- main
            \
             I   <-- develop (HEAD)

Nothing else changes: commit H doesn't change (it does not point forwards to I; I points backwards to H instead). HEAD is still attached to develop. The only changes are that we have a new commit I, with new metadata and snapshot, and that new commit I's hash ID is stored in develop now.

Suppose you now make your own new branch name fix-123, and switch to that branch:

...--F--G--H   <-- main
            \
             I   <-- develop, fix-123 (HEAD)

Now you make two new commits J and K:

...--F--G--H   <-- main
            \
             I   <-- develop
              \
               J--K   <-- fix-123 (HEAD)

Now let's say that someone else has made a new commit for develop. You git checkout develop or git switch develop to get:

...--F--G--H   <-- main
            \
             I   <-- develop (HEAD)
              \
               J--K   <-- fix-123

Then you obtain their new commit (git fetch + git merge, perhaps, or git pull if you use the shorthand all-in-one command that works about as well as all-in-one washer/dryer combinations for instance), so now you have:

...--F--G--H   <-- main
            \
             I--L   <-- develop (HEAD)
              \
               J--K   <-- fix-123

It's now time for rebase.

Rebasing is copying

We'll stop drawing in main now, and just use:

...--I--L   <-- develop (HEAD)
      \
       J--K   <-- fix-123

to make things a bit more compact. Our problem now is commits J-K. There's nothing really wrong with them, except ... well, the problem is that commit J has, as its parent, commit I. We'd like a commit that has commit L as its parent.

Nothing about any existing commit can ever change. We can't fix commit J. But we can make a new commit that is very much like J, only different. We can do the same with commit K, too. What we want to get is this:

          J'-K'  <-- new-and-improved-fix-123 (HEAD)
         /
...--I--L   <-- develop
      \
       J--K   <-- old-and-lousy-fix-123

Here J' is our copy of J, and K' is our copy of K. There are two things different in J vs J'. There are two things different in K vs K' too. In particular, both copies have a different parent commit, which we can see from the drawing. Both copies also have any differences needed in their snapshots, being based on the snapshot that's in L instead of the snapshot that is in I.

To get here, from where we are, we need to:

  • list out the hash IDs of the commits that we want to copy, J and K;
  • make a new branch name at commit L;
  • check out that new branch name;
  • copy the first commit J; and
  • copy the second commit K.

When we're done with all of that, we have the new diagram, and all we have to do now is fix up the branch names.

There's a way to do this manually, and it's not even all that hard:

git checkout -b new-fix-123 develop
git cherry-pick <hash-of-J>
git cherry-pick <hash-of-K>

We can make this even shorter:

git checkout -b new-fix-123 develop
git cherry-pick <hash-of-J> <hash-of-K>

reduces this to two commands. But then we still need to move the fix-123 name to where we are now, and check out fix-123 again:

git checkout -B fix-123

would do that, and then we can delete new-fix-123 and we'll have:

          J'-K'  <-- fix-123 (HEAD)
         /
...--I--L   <-- develop
      \
       J--K   <-- ???

Note that there is no longer any name by which to find commit K. We forced Git to move the name fix-123 to point to K'. The old commits still exist. We just can't find them any more.

The git rebase command does this all in one step

Again, we're starting with:

...--I--L   <-- develop (HEAD)
      \
       J--K   <-- fix-123

Running:

git checkout fix-123
git rebase develop

has Git:

  • list out commits that are "on" fix-123, but not "on" develop: that's hash IDs J and K. Git actually generates this list backwards—because Git always works backwards—but git rebase then reverses the backwards list, so that it's forward.

  • Create a temporary "branch" at commit L. Git uses detached HEAD mode for this, rather than bothering with a branch name. We'll skip the details here, but they matter if something goes wrong.

  • Run a git cherry-pick for each commit in the list.

  • Force the branch name we were on—fix-123—to here and check it out again.

That's exactly what we'd do manually, but Git does it all automatically. As long as nothing goes wrong, we end up with what we wanted. The real trick here is recovering from failures.

What could go wrong?

"Copying" a commit—with cherry-pick, or with more primitive methods that git rebase used to use in older versions of Git—can fail. In particular, each git cherry-pick operation:

  • has to figure out what changed, then
  • has to figure out how to apply those changes here, to the other commit.

Technically speaking, what Git does to achieve this is to use its merge engine. This usually works pretty well, but it can stop with a merge conflict. When it does, you must resolve the conflict, then continue the rebase.

When Git goes to figure out which commits to copy, sometimes you don't get the set of commits you expected. You can use git rebase --onto to help out here, but I'm not going to say anything more in this answer. If you have a set of commits that includes any merge commits, they complicate matters. I'm not going to cover them at all here. Git does now (as of 2.22) have a --rebase-merges option that can make this work, but it's a bit tricky.

Finally, if a rebase does go wrong and you wish you hadn't started it in the first place ... well, if you're stuck in the middle of a failing rebase, you can use git rebase --abort:

          J'  <-- HEAD
         /
...--I--L   <-- develop
      \
       J--K   <-- fix-123

When the copying of K fails, perhaps because there's a conflict with L, and you decide you'd prefer to be back to this:

          J'  [abandoned]
         /
...--I--L   <-- develop
      \
       J--K   <-- fix-123 (HEAD)

a simple git rebase --abort suffices. But if you've let the rebase finish, or worked on the conflicts and continued the rebase and are now at:

          J'-K'  <-- fix-123 (HEAD)
         /
...--I--L   <-- develop
      \
       J--K   [abandoned]

and decide that the whole thing was a mistake, you need to find the hash ID of original commit K in order to recover.

There is a way to do this, using git reflog. But it's a pain. Fortunately, there's a much easier way. If you're about to start a rebase, and are not sure you want to use the result or stick with the original, just create a new branch before you start:

...--I--L   <-- develop
      \
       J--K   <-- fix-123.0, fix-123 (HEAD)

Now after your rebase, you will have:

          J'-K'  <-- fix-123 (HEAD)
         /
...--I--L   <-- develop
      \
       J--K   <-- fix-123.0

The old series of commits ending at K are still easy to find, using branch name fix-123.0.2 If I find myself rebasing again, I make a new fix-123.1 before I start. So fix-123 is the current one and the numbered ones are the previous ones, there if I want them, delete-able once I'm sure I am done with them.


2With older computer folks like me, you can often tell whether we came out of the maths department or the physics / engineering department by whether we start counting from zero. I kind of did both, but my heart was a little more with the mathematicians. Sometimes I think I should make these names more detailed, e.g., have dates on them, but simple numbering seems to work pretty well. I rarely get higher than about 3 or 4.

torek
  • 448,244
  • 59
  • 642
  • 775