-1

I have a PR with two commits. I know that I can squash and merge the PR but when I want to cherry-pick these changes to another branch, I still have to cherry-pick the two commits. Is there any way to squah these two commits so I don't have to make two cherry-picks? Or can I cherry-pick the merge commit into the original branch?

nick
  • 2,819
  • 5
  • 33
  • 69

2 Answers2

2

git cherry-pick doesn't support squashing during cherry picking. You need to

  1. cherry-pick those two commits.
  2. Do an interactive rebase git rebase -i "HEAD~2" on branch which accepted the former 2 commits, and squash these 2 commits into 1.

"HEAD~2" means doing interactive rebase on latest 2 commits.

Pro Git has a good tutorial about squashing with interactive rebase.

Besides, if you prefer doing it with GUI. git-fork is the only GUI i've known that supports interactive rebase.

Simba
  • 23,537
  • 7
  • 64
  • 76
2

Git, in the end, is all—and almost only—about commits. PRs are a GitHub feature and not really on topic for Git itself, but a GitHub PR uses commits (since that's pretty much all there is, in Git).

The idea of "squashing" is basically to take some existing commits, each of which does something you like on its own but which you don't like as stand-alone commits, and using those to make one new commit that you like even better. This is required, because no existing commit can ever be changed. Your question therefore really comes down to this: What are the ways of taking some existing commits and making new and improved commits from them?

Git itself has three primary answers:

  1. git cherry-pick.
  2. git rebase.
  3. git reset, usually with --soft.

The basic function of git cherry-pick is to copy one commit. This, clearly, isn't what you want. It's still very useful to know. Obviously you do know about it.

The basic function of git rebase is to copy some arbitrary set of commits (one or more commits), while optionally making changes along the way. The interactive form of git rebase, git rebase --interactive or git rebase -i for short, is particularly good at this. It has a ton of features, though, which means there is a lot to learn here!

The basic function of git reset—well, one of several basic functions—is move a branch name, which is itself complicated. We can use this to do exactly what you want in fewer steps than with git rebase -i. Simba's answer has a good short description of using git rebase -i, though it offloads all the hard work to the Pro Git book. Remember, as you learn it, that git rebase consists of a lot of cherry-picking, followed by one git reset-like operation.

Let's look here, instead, at just squashing with git reset.

Commits

Let's start with how Git makes commits. This is pretty basic stuff, but a lot of tutorials skimp on it and leave you in the dark. Git becomes very scary and confusing. If we go over these basic items, Git will be a lot clearer and brighter and less scary.

As I said at the top, Git is really all about commits, so we need to know exactly what a commit is, and does for you. It is:

  • Numbered. Every Git commit has a unique number, usually expressed in hexadecimal as a hash ID. These things look random, but are actually not random at all: they're computed from the commit's content, using a cryptographic hash function. This hashing trick is why no part of any existing commit can ever change.

  • A container for a snapshot of all of your files, as of the time you—or whoever—made the commit. These files come from a somewhat unexpected source though, as we'll see in a moment. They are stored in a weird, Git-only, compressed and de-duplicated form, that only Git can read (and nothing—not even Git itself—can overwrite, due to that hashing trick again).

  • A container for some metadata, or information about the commit itself: who made it and when, for instance. The metadata include a log message, too, and—crucially for Git's own internal operation—the metadata in any one commit can include a list of previous commit hash IDs.

So, commits have numbers—random looking hash IDs—and hold a snapshot and some metadata. Moreover, most commits—the ones we call ordinary commits—hold exactly one previous commit hash ID, for the commit we call the parent of the commit.

These parent linkages form backwards-looking chains. That is, if we use single uppercase letters to stand in for real, actual commit hash IDs, we can draw a commit chain like this:

... <-F <-G <-H

Here H is the last commit in the chain. It points to (stores the hash ID of) its parent commit G, which points to its parent F, and so on.

In this way, Git can find every commit if we just tell it which commit is the last commit. This is where branch names come into the picture. To make it fast and easy to find H's actual hash ID, Git will store this hash ID in an easy to look up, memorable name that doesn't change (unless you change it anyway):

...--F--G--H   <-- main

Git simply defines a branch name as a name that holds one commit's hash ID. That commit has to exist in the repository: if it doesn't, the repository is damaged. That commit is then the last commit that we say is "on" or "contained in" the branch.

Adding new commits to a branch

Git in general likes to add new commits. We start by picking out one of our branch names, or creating a new one, so as to pick out one of our existing commits. We ask Git to "attach its HEAD" to this name:

...--F--G--H   <-- dev (HEAD), main

Here, we now have two names, both of which select commit H as their last commit. All the commits up through and including H are on both branches. But we've told Git to git checkout dev or git switch dev: to attach HEAD to dev.

This:

  • makes commit H the current commit;
  • cleans out our working tree of any files from the previously-current commit, and populates the working tree with files from commit H instead;
  • does the same with Git's index, which Git also calls the staging area or—rarely these days—the cache.

Because the committed files are in a weird, Git-only format (with compression and de-duplication), Git has to extract the files. It first "copies" the Git-ified version of the file to its own index. Since this file format de-duplicates, this "copy" doesn't really take any space.1 Then Git expands the file out into a usable form, in your working tree. This lets you see and work with the file (hence the name "working tree").

You can now edit the working tree copy. If you do, you must tell Git: fix up the index copy. You do this with git add, which makes Git make the index copy of the file match the working tree copy. Git will read the working tree file, compress the data, de-duplicate it against any existing copy, and update Git's index copy as needed.

Eventually, you run git commit. This has Git package up whatever is in the index right now. Git doesn't really look at your working tree at this point. What it cares about, right now, is what's in the index right now.2 These index contents become the snapshot for the new commit. Git also gathers up all the metadata for the new commit, which might involve running your editor on a commit message. In any case, if the commit is to proceed, Git will get all this stuff together, one way or another. Now it has everything it needs to write out the commit:

  • Git sets the parent hash ID, in the new commit-to-be-made, to the current commit hash ID. That is, the new commit will point back to the current commit H.
  • Git writes out the snapshot and metadata, gaining a new, unique hash ID for the new commit. We'll call this commit I.
  • Now, as its last little trick, git commit writes I's hash ID into the current branch name.

It's this last trick that "grows the branch", by one commit, like this:

...--F--G--H   <-- main
            \
             I   <-- dev (HEAD)

Once we repeat this to make a second commit, we have:

...--F--G--H   <-- main
            \
             I--J   <-- dev (HEAD)

Note how HEAD stays attached to the branch name, but the name itself moves, to point to the latest commit.

In each case, whatever is in Git's index is what goes into the commit. In other words, the index is acting as the proposed next commit. When we run git add, we update this proposed next commit, using whatever files we have in our working tree.


1An index entry does take a little space: it's a cache entry in .git/index or some other file, and it needs space for its cache data, a blob hash ID, and its path name. But this doesn't depend on how big the file itself is, just the size of a Git index entry.

2Note that git commit -a, and other similar operations, work by first updating the index. This part is especially tricky because these kinds of git commit operations need to make one or two extra index files, in case of a commit failure or abort, so that they can roll everything back. If all goes well, they use the updated index to make the new commit, and shrink everything back to just the one normal, everyday index when they're done. In the end, though, the important thing for us right here is that Git is still doing the commit from its index. It's just that git commit -a updates Git's index first, then commits. You might as well use git add.


git reset

We're now ready to look at git reset. This is a big, complicated command, with many modes of operation. We're only going to look at the three main ones here:

  • git reset --soft takes, as an argument, a commit hash ID. It moves the current branch name—the one HEAD is attached to—to point to that particular commit. Then it stops. We have a new current commit, but nothing else has changed.

  • git reset --mixed, or git reset without a flag, does the same thing but then resets Git's index by reading out the files in the commit we just moved to. That is, Git's proposed next commit now matches the new current commit.

  • git reset --hard does the same thing—including resetting Git's index—but also resets our working tree by replacing the files there. That is, not only is Git's proposed next commit re-set, so is our working tree.

So this is why—and how—git reset --hard throws away uncommitted work. We have a situation like this:

...--F--G--H   <-- main
            \
             I--J   <-- dev (HEAD)

and we change our minds about commits I and J, deciding they're worthless trash, so we run:

git reset --hard main

Git uses the name main to locate commit H. Then it wipes out both its own index (proposed next commit) and our working tree (the files we can see and work with), replacing them with the files from commit H, and leaving us with:

...--F--G--H   <-- dev (HEAD), main
            \
             I--J   [abandoned]

But, what if we run:

git reset --soft main

here? Then Git will find H as before, and make the name dev point to H as before:

...--F--G--H   <-- dev (HEAD), main
            \
             I--J   [abandoned]

The key difference is that with --soft, we told Git to leave our working tree and Git's index alone. So we can now run:

git commit

which uses Git's index, which matches our working tree, which matches commit J's snapshot. We have to write a new commit log message now but Git now makes new commit K that matches the snapshot in commit J:

             K   <-- dev (HEAD)
            /
...--F--G--H   <-- main
            \
             I--J   [abandoned]

We have now "squashed" commits I-J into one commit K. This needed just two Git commands—git reset --soft and git commit—plus a new commit message.

You understand git commit --amend now too

Now that you know how git reset works—keeping the index and working tree, but moving a branch name—and how git commit afterward makes a new commit that comes after some older commit that we just reset our branch name to, that is—you also can see how git commit --amend really works.

Suppose we have this:

...--G--H   <-- main
         \
          I   <-- dev (HEAD)

We run a test on our new commit I and, oops, we forgot one small thing. So we fix up the file and git add as usual. If we run git commit now, this will make a new commit J whose parent is I:

...--G--H   <-- main
         \
          I--J   <-- dev (HEAD)

If we're really sure we don't need to preserve everything, though, we can run git commit --amend. This makes a new commit whose parent is not I, but rather is I's parent. In this case, that's commit H:

          J   <-- dev (HEAD)
         /
...--G--H   <-- main
         \
          I   [abandoned]

The drawback is that it becomes very hard to find commit I. (Hash IDs look random, after all. Will you remember the old one?) But if we're really sure about this, it's a quick and easy way to avoid having to squash later.

There's another handy trick: we can git commit --amend -C HEAD. The -C flag means grab the commit message from the named commit. Since the HEAD commit is I at the time we start this whole process, this grabs our old commit message. The -C option means don't bother editing it (lowercase -c means grab it, but do bother letting me edit it).

torek
  • 448,244
  • 59
  • 642
  • 775