2

The main branch includes commits from a feature branch feature-A. I want to revert main branch to the commit before the feature branch code change was merged and deploy this code for testing. Let's say that during testing multiple bugs are reported and I push commits to main to fix the bug. When the main branch does not have any more bugs, I then want to re-merge the code changes from feature branch feature-A back into main.

Note: main is protected, so a pull request is needed to apply new changes.

Example:

(main) 1 - 2 - 3 - 4 - 5

Commits 3, 4, 5 includes feature-A

Now rollback main, so it becomes:

(main) 1 - 2

Apply bug fixes as part of commits 6 and 7

(main) 1 - 2 - 6 - 7

Now merge feature-A back to latest state of main

final state - (main) 1 - 2 - 6 - 7 - 3 - 4 - 5

What I have tried: I first created a backup of my main branch and called it main-backup

From this post: Want to change my master to an older commit, how can I do this? I tried the below which gives me a new branch prior to feature-A which can be merged to main.

If you want to avoid force pushing, here's how to revert your repo to an older commit and preserve all intervening work:

git checkout 307a5cd        # check out the commit that you want to reset to 
git checkout -b fixy        # create a branch named fixy to do the work
git merge -s ours master    # merge master's history without changing any files
git checkout master         # switch back to master
git merge fixy              # and merge in the fixed branch
git push                    # done, no need to force push!
Done! Replace 307a5cd with whatever commit you want in your repo.

(I know the first two lines can be combined, but I think that makes it less clear what's going on)

Here it is graphically:

c1 -- c2 -- c3 -- c4 -- c2' -- c5 ...
        \              /
         '------------'
You effectively remove c3 and c4 and set your project back to c2. However, c3 and c4 are still available in your project's history if you ever want to see them again.

So I create the PR and merge. Now main is at the commit before feature-A.

Now if I create a PR to merge main-backup into main it should show me all the code commits/changes related to feature-A but GitHub says there are no code changes.

So my understanding is not right here. What is the best (and possibly the safest) way to do this?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
mosranna
  • 67
  • 7

1 Answers1

2

Before I dive into the answer, I need to address this part, because I have to draw some diagrams and I like to draw correct and unambiguous ones:

Example:

(main) 1 - 2 - 3 - 4 - 5

Here, you have drawn a repository containing five commits total, all on a branch named main. There are a couple of issues I have with this particular drawing. One is quite minor, but might cause confusion later. The other is also minor but important.

The first is that while Git commits are numbered, they don't have simple sequential counting numbers. Their actual numbers are enormous (up to 2160-1 at the moment, and in the future, even bigger) and look random. As a result I think it's better to use letter codes for the commits: here I'd use A through E, or F through J, or some such.

The second issue—important, though still minor—is that a branch name like main isn't a label you should stick in front of commits. It's a label you should paste on one single commit, in this case, commit #5 or—as I will call it—commit E:

               main
                 |
                 v

 A <-B <-C <-D <-E

Note how each commit then points backwards to the previous commit.

For a somewhat more compact representation, I tend to use this:

A--B--C--D--E   <-- main

on StackOverflow. This drops the internal arrows (which isn't great, but is barely tolerable as we'll see).

The reason to draw them like this is that a branch name really does point to exactly one commit. That's the first crucial item to understanding what we can do with your existing repository and its commits. And, each commit in a chain of commits like this really does point backwards to the previous commit.

More precisely, the commits are, as we noted earlier, numbered. The actual numbers—expressed in hexadecimal—look bizarre and random, like f6b272b0c674c2d5022e90c3dec868af4ea26522 for instance. They're too difficult for humans to bother with. That's why we have the computer remember the last one in a chain using a branch name like main.

Each commit contains two things:

  • A commit has a full snapshot of every file. The files stored inside each commit are stored in a special, Git-only, compressed and de-duplicated format. This is something your computer cannot read in general; only Git can read these files. So you don't actually use these files: they are stored only for history purposes. They act like a permanent archive, like a zip or tar archive of every file.

  • And, each commit contains some metadata, or information about the commit itself. This includes the name of the person who made the commit, their email address, and a date-and-time stamp, for instance. But it also includes the raw hash ID of the earlier commit. Hence, commit E, whatever its hash ID is, literally contains the hash ID of earlier commit D (whatever that hash ID is).

The fact that a commit points backwards to its parent is what allows Git to get away with storing just one commit hash ID in a name like main. When main points to E, that's sufficient, because E points to D. Git can find D from main by going through E. Since D points backwards to C, Git can find C from main, by going back two hops ... and since C points backwards to B, Git can find B here as well, and that finds A (and then we run out of commits and git log stops here).

Note: main is protected, so a pull request is needed to apply new changes.

We now need one other fact about Git. I mentioned above that Git doesn't store changes, but rather snapshots called commits, and that Git finds commits by starting from a branch name, then working backwards as needed. The last thing you need to know is this: No commit can ever be changed.

But: if Git stores only snapshots, and no commit can ever be changed—and both of these are true—then how do we ever get any changes into a Git repository? The full and correct answer is long, so I'll take some extra time writing this up and cut it short:

  • We extract the last commit on some branch, using the branch name.
  • We then work with the extracted commit for a while, and eventually make a new commit: a new snapshot.

That is, given:

A--B--C--D--E   <-- main

we run git checkout main or git switch main to extract commit E, then we do our work, and then we run git commit to make a new commit F:

A--B--C--D--E   <-- main
             \
              F

In order to find commit F, we need a name—typically a branch name—so when we're using GitHub and protected branches, we make a new branch name along the way so that commit F is found by the new name:

A--B--C--D--E   <-- main
             \
              F   <-- feature

Then we push our own Git repository's new branch to GitHub, make a "pull request", and use the GitHub-specific pull request machinery to eventually incorporate either F or some copy of F (F') on GitHub. Again, there are a lot of details and normally I would go into all of them here, but I'm taking time to hold myself back. If we do everything Just Right, we end up with:

A--B--C--D--E--F   <-- main, feature

and we can now delete the name feature. Note that commit F is not changed at all. It still points back to commit E. What has changed is the name main itself, which now points to F.

This gives us a simple set of rules about Git:

  • We can always add more commits.
  • We can move branch names around. With a GitHub "protected branch", the names in the GitHub repositories (but not in other clones!) are protected from being moved around. Only some special GitHub-only cases are allowed. You will probably want to override this (e.g., by temporarily de-protecting main), but see below.
  • We can't change any existing commit.
  • We can't really delete commits either.
  • But people find commits using the names, which we can move.

On to your first problem

Now rollback main, so it becomes:

(main) 1 - 2

This would be:

A--B   <-- main
    \
     C--D--E

No commits have changed. We've merely used our ability to move a name. In this case we moved the name main to make it point to B instead of E.

This immediately runs into two problems:

  1. You set main on GitHub to be protected. So you have disallowed yourself from moving it yourself.
  2. Even if we work around problem #1, what happens to commits C-D-E?

While commits C-D-E will stick around (for some indefinite amount of time), we only find commits by using a name and then working backwards. We need a name by which we can find commit E.

There's a secondary problem as well, although how important it is depends on how you and your co-workers / friends / colleagues use Git and GitHub. Specifically, anyone who cloned the GitHub repository probably now thinks that branch main should end at commit E too.

Git likes to add new commits to the end of a branch, as we saw above. It's not very willing to "lose" commits off the end of a branch. Git knows that if you add a commit to a branch—by picking up an all new commit like F that comes after E, or making an all-new commit that comes after E, or just taking some commit you already that, like F, comes after E, and adding it to main—if you just add a new commit, all the old commits are still there, and easy to find in Git's usual backwards fashion. To move a branch name in such a way that it adds a commit, Git generally says OK! DONE! (GitHub's branch protection, which you set up on the GitHub side in the Git repository copy that lives over on GitHub, makes it say no instead.)

But if you ask Git to move a branch name backwards, so that it will lose commits off the end of a branch, Git will generally be horrified and say No! That would lose some commits! (Sure, they're still in there, but you would have to know the big ugly random-looking hash ID to find them.)

You can get around these:

  1. To stop main from being protected, remove the protections: either give yourself full power (make main only partly protected), or unprotected it entirely so that everyone has full power. Then use git push --force to update it so that Git's own normal no answer is turned into okay, but under protest.

  2. To keep commits C-D-E, just make a new name that remembers E for you.

So once you do that, you will have:

A--B   <-- main
    \
     C--D--E   <-- saved

for instance. After fixing up the protections on GitHub (and/or repeat as needed until the git push --force below works) run:

git switch main          # switch to main
git status               # make sure that this worked and all is clean and good
git branch saved         # make new branch name to remember E
git reset --hard HEAD~3  # force `main` back three steps
git push -f origin main  # force GitHub to drop 3 commits too

Whether you git push origin saved or not is up to you: that lets you give a name to the three commits over on GitHub's copy as well. (Branch names are local to each Git.)

Apply bug fixes as part of commits 6 and 7

(main) 1 - 2 - 6 - 7

Let's draw this properly:

A--B--F--G   <-- main
    \
     C--D--E   <-- saved

You get to this state by, while being on main, creating the two new commits F and G. Since these commits merely add on to main, you can push these directly to GitHub now (you have the ability to push directly to GitHub: main is not protected for you). Once you've finished this step you can re-protect main on GitHub if you like.

Now for your main (no pun intended) problem

Now merge feature-A back to latest state of main

final state - (main) 1 - 2 - 6 - 7 - 3 - 4 - 5

Whoa, hold on a second: where did feature-A come from?

I know that you mentioned it earlier. But branch names in a Git repository only matter in terms of finding one commit. That one commit is the one to which the branch name points. So where in all this was feature-A? We never drew it in.

Before you attempt to solve this problem for real—and hence before you use any of what I've mentioned so far—you need to draw in feature-A, properly. Here, you need to be careful, because when you use the GitHub protected branch feature and the merge buttons there, you can get surprises.

In particular, the green MERGE button has a drop-down arrow on it. Using the arrow causes the button to change to one of three options:

  • MERGE means just that: do a full merge. GitHub will include a merge commit (see below).

  • REBASE AND MERGE means copy some original commits to new-and-improved ones. This may be what you did earlier, in which case the three saved-branch commits are different commits from the feature-A commits.

  • SQUASH AND MERGE means take a whole chain of commits, and turn them into a single ordinary commit and add that on to the end of the current branch. In this case main would not have three commits from some feature-A branch, nor a merge commit. It would instead have one ordinary commit. That doesn't fit with C-D-E at all.

So we've been working with a false picture. Only you have the repository here, so only you can draw a true picture. We cannot come to any solid conclusions without one. However, if the picture we've been drawing here is close enough, this whole reset-and-rebuild method will let you proceed. You can now simply merge saved with main:

git switch main    # if needed
git merge saved

This uses Git's merge machinery to combine work, in the usual way.

Let's redraw the inputs a bit:

     F-----G   <-- main (HEAD)
    /
A--B
    \
     C--D--E   <-- saved

This is the same drawing as before, but I've moved F-G up to a separate line and slid G over a bit. I also added the special name HEAD, in parentheses, as being "attached to" main, so that we know which branch we are "on" when we run git merge.

The way merge works is to locate the best shared commit—a commit that's on both branches—as the thing that Git calls the merge base commit. Here, the best shared commit, between main (commit G) and saved (commit E) is clearly commit B. At least, it should be clear in the updated drawing.

So, having located commit B on its own, Git will now run two git diff commands, to see what's different in three snapshots:

  • git diff --find-renames hash-of-B hash-of-G: this finds out what "we" changed on main since B. That is, whatever comes out of this diff is the sum of whatever changes we made in commits F and G.

  • git diff --find-renames hash-of-B hash-of-E: this finds out what "they" (we, really) changed on saved since B.

Git can now combine these two sets of changes. This is a regular, full-blown merge, of the sort that Git tries to do on its own. If we and they changed different files, or different (non-overlapping) lines of the same files, Git will be able to combine the changes. Git can then apply the combined changes to the snapshot found in commit B. This:

  • keeps our changes, but also
  • adds their changes.

The result, then, is ready to be put into a new snapshot. The next letter is H, but let's call this commit M, and draw it in:

A--B--F--G---M   <-- main (HEAD)
    \       /
     C--D--E   <-- saved

The only thing special about commit M is that instead of pointing back just to commit G, the way an ordinary commit would, it points back to two commits—two parents, in Git's terminology—G and E.

GitHub can make this commit M if Git can make this commit M. To do that you'd just push saved to GitHub and issue a pull request as usual. Note that there is no feature-A branch involved here: we've simply made two commits on main, after rolling main back, and then used git merge. The only thing special here was the roll-back part, which required undoing branch protection and using git push --force. Once commit M is done, we can safely delete the name saved because commit E can be found by moving to commit M (however we find M), then stepping back to the second parent.

This does also potentially make a small mess for anyone else who has a clone of the original GitHub repository. Here's what they might have, in their Git repository:

A--B--C--D--E   <-- main, origin/main
             \
              H--I   <-- my-feature (HEAD)

If they now pick up new commits from GitHub, their repository becomes:

A--B--F--G---M   <-- origin/main
    \       /
     C--D--E   <-- main
            \
             H--I   <-- my-feature (HEAD)

That is, their main used to be exactly level with their origin/main, which said commit E. That's their main so it has not yet changed. If they re-synchronize their main right now—which will go fine as M is "ahead of" E—they will now have:

A--B--F--G---M   <-- main (HEAD), origin/main
    \       /
     C--D--E
            \
             H--I   <-- my-feature

There isn't any real problem here, but they might want to have their my-feature be "based" on origin/main or main. That is, they might want to get this now:

               H'-I'  <-- my-feature (HEAD)
              /
A--B--F--G---M   <-- main (HEAD), origin/main
    \       /
     C--D--E
            \
             H--I   [abandoned]

To do that, they will want to use git rebase. They will need to understand how this rebase works, in case anything goes wrong.

Conclusion

You may have led us all down the garden path with an improper drawing of what's actually in the repository, but if not, your specific request is pretty straightforward. You can't do it unless and until you de-protect main, at least for yourself, over on GitHub, because the GitHub tools will protect you from yourself here. Git is perfectly willing to do this though.

Whether the minor problems this could create for others are something you should worry about is only something you and they can answer.

It's important to understand the mechanisms that Git uses where when looking at goals. There are other ways to achieve various goals that don't involve git push --force, but without complete and accurate drawings and access to your entire team, it's hard to provide particular recommendations.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 1
    Torek: "I'm taking time to hold myself back". Also Torek: write a detailed 16900 characters long answer. Considering the limit is 30K chars, I'd say: "holding oneself back achieved". – VonC Oct 18 '21 at 20:16
  • @torek that must have been the most detailed and in-depth answer I could have ever imagined. I expected something along the lines of "run these commands and you're good to go". I apologize for the horrible illustration in my example, ha. I am just going to use your diagram as a reference in the future. As far as the question is concerned, your answer solved the issue I was facing and learnt a lot in the process. Thank you for spending your time in educating and diving into the details. – mosranna Oct 20 '21 at 21:47