0

I am having this problem.

I have a code where I made about 30 commits...

on the commit 28 I realised I had wrongly typed my username, so I updated it with git config --global user.name

I am trying to avoid rebase all the previous commits, and I am wondering if I make a squash and merge using github.com from their UI, will all my previous 30 commits should be one and then should be address to my corrected username?

What if someone uses the annotate command? will it appear my username with typos or my final username?

jpganz18
  • 5,508
  • 17
  • 66
  • 115

2 Answers2

2

TL;DR

As Tim Biegeleisen noted, the author and committer of a Git commit is part of the commit itself, and the existing wrongly-addressed commits won't change. You can, however, stop using them, and that's what git merge --squash enables. Hence the answer is both: yes, this can fix the problem and no, this won't fix the problem, depending on how you look at it and what you do during and after the point at which you make the git merge --squash commit.

Basically, after the squash-merge, you'll just delete the branch on which you made these 30 commits. You might have to delete it from two places, though: your Git on your machine, and the Git over on GitHub (which is, presumably, sort of yours, but also sort of not—after all, GitHub are the ones controlling it).

Long

You mentioned:

using github.com from their UI

When you work with Git from the command line (on Windows or Linux or MacOS or whatever), you generally start by cloning a repository from somewhere:

$ git clone https://github.com/path/to/repo.git
$ cd repo

for instance (though I prefer ssh://git@github.com/... URLs). There are now two Git repositories: one over on GitHub, and one locally on your own machine.

Squashing, fetching, pushing, and multiple Git repositories in general

There are two things to know, in general, here: how git merge --squash works, and how these two different Gits communicate and share commits. In particular, there are two parts to this, no matter how you go about it: you're going to first make one new commit, then you're going to use git push or git fetch to transfer the new commit (and maybe old commits, or not) between the two separate Gits.

Any commits you make on your machine using your Git use the settings you've made on your computer—but if you use the web interface over on GitHub, and click various buttons there, you're using their machines to make commits in their Git repository, which will use the settings they have stored for you. Hence if you go to the button that has a side dropdown arrow that lets you select between merge, rebase-and-merge, and squash-and-merge, select any of those three, and click it, their Git makes a new commit in their Git repository and all the settings you've set in your own Git on your computer are irrelevant.

You can do the git merge --squash on your own machine, and then use git push, so that you make the commits locally, and only then send them to other Git. Whether and when to do that is up to you.

Some background about making any new commit

You start by picking some branch name and some tip commit to be on, either by explicitly running:

git checkout somebranch

or implicitly by the fact that you did that at some point earlier, so in any case, in this one particular repository, you're now on the branch somebranch, which by definition has some latest, or tip, commit:

...--o--o--o  <-- somebranch (HEAD)

Each commit has some big ugly hash ID that means that particular commit: any Git repository can immediately tell whether it has that commit, and if it does have that commit by that hash ID, it has that exact commit. All Gits everywhere in the universe agree: a commit with those contents—that snapshot, that user name and email address, those date stamps, that log message, and so on—will have that hash ID, and no other commit will ever have that hash ID.

The name somebranch holds that commit's hash ID inside it, and you can do:

git rev-parse somebranch

to see that hash ID. You'll also see that hash ID, or an abbreviated version of it, in git log and git log --oneline output. That hash ID, stored in that branch name, is how Git knows that your current branch somebranch ends with that particular commit.

Inside that commit is the hash ID of its immediate predecessor or parent commit. That's how Git can go from the tip commit, to the commit one step earlier. Inside the parent is the hash ID of the current commit's grandparent, so from the parent, Git can walk to the grandparent. The grandparent has another hash ID, so Git can walk to the great-grandparent, and so on. This chain of hash IDs allows Git to find all the commits, starting from the end and working backwards.

Hence if we have some chain of commits ending at, say, commit with hash H:

... <-grandparent <-parent <-H   <-- branch-name

and we go to make a new commit, the way Git actually makes the new commit is to package up the commit snapshot, complete with author name and log message and so on, set its parent hash to H, and write it out. Because the contents of this commit are unique—the parent hash H, together with a time-stamp Git adds that tells which exact second it was when you make the new commit, make sure it will be unique—the new commit gets a new unique hash ID. Let's call that I:

... <-parent <-H <-I

Now all Git has to do is write the hash ID of new commit I into the branch name, so that commit I is the tip.

In other words, Git starts at the end of each branch, and works backwards. So this kind of chain allows Git to find all the commits that are contained in this branch. Some commits may only be contained in some other branch:

       B--C   <-- master
      /
...--A
      \
       D--...--G--H--I   <-- somebranch (HEAD)

Here, there are a bunch of commits—say, somewhere around 30 commits—that are only on somebranch, with two commits that are only on master and many commits that are on both branches. (If you count D-E-F-G-H-I-J, there's only room for 7 commits; well, let's just pretend there are a bunch of new letters somewhere between D and H :-) ). Commits D through H have the wrong author name while commits I and J have the right one. Our plan now is to stop using all of D through H entirely.

Using git merge --squash

Let's look at precisely how git merge --squash really works. Essentially, it does the same thing that a regular merge would, except right at the end. A regular merge starts out by finding which commit is the best shared commit, i.e., the best commit that's on both branches. This best-shared-commit is the merge base of the two branches.

We check out one branch, then name the other in our git merge --squash command:

$ git checkout master                   # note: this attaches HEAD to master
$ git merge --squash somebranch

Git examines the chains of commits that make up the history that's in this repository, to find the best shared commit. In our case, with the history of master leading back to commit A and the history of somebranch also leading back to to commit A, commit A is the best shared commit. (All commits before A are shared as well: they're just not as good as A.)

Next, Git compares the snapshot in the merge base—in commit A—to that in our current branch tip commit C:

git diff --find-renames <hash-of-A> <hash-of-C>   # what we changed in master

It does the same to find out what "they" (really, we) changed in the other branch:

git diff --find-renames <hash-of-A> <hash-of-I>   # what they changed

Then Git does its best to combine these two sets of changes, applying the combined changes to the snapshot from commit A. If all goes well, this combination is the snapshot to use. If it does not go well, Git will stop with a merge conflict, and get help from the user to fix it up and complete it.

(More precisely, your Git will do that. The web-based Git on GitHub will just say sorry, I can't do that and not even let you start the merge at all—they check up front before making the merge button clickable. It's a little different from an ordinary Git since it has no way to let you interact with it to resolve conflicts. The GitHub folks do have a desktop client, but I've never used it; I don't know if it's a wrapper around command-line Git, or something fancier.)

We'll assume here that either all goes well, or you have fixed all the conflicts and run git merge --continue or git commit to finish the merge. Git now makes a commit that has, instead of the usual one parent, two parents:

       B--C--------------J   <-- master (HEAD)
      /                 /
...--A                 /
      \               /
       D--...--G--H--I   <-- somebranch

The new commit goes onto branch master because HEAD is attached to master (due to our git checkout master above). New commit J points back to commit C, the commit that was the tip of master; but it also points back to commit I, the commit that is still the tip of somebranch. Commits C and I are now both on master, because of this extra, second parent that leads back to commit I.

When you use git merge --squash, the key difference is that new commit J does not have a second parent:

       B--C---------------J   <-- master (HEAD)
      /
...--A
      \
       D--...--G--H--I   <-- somebranch

Everything else is exactly the same: the snapshot for J, the author and committer (set by whichever Git actually makes commit J), the timestamp (likewise), and the parent hash ID of commit C. But there's no second parent for new commit J.

If you want commits D through I to vanish, you now simply delete the branch somebranch. The name somebranch is how your Git finds these commits: it starts by fishing the hash ID of I out of the name somebranch, then uses I to find H, H to find G, and so on, all down the line until you reach A. The name somebranch will let you reach A, but so will the name master. So deleting the name somebranch makes commits D through I vanish from view. Eventually—typically some time after 30 days or so—your Git will garbage collect these commits.

But again, you mentioned "using [the] github.com ... UI". If you are going to do that, you have to send commits D through I to GitHub first, or maybe you have already sent them.

Transferring commits with git fetch and git push

You can, at any time, have one Git connect itself to another Git and either get (fetch) or send (push) commits from or to the other Git. From your own computer's Git, then, you obtain the GitHub-Git's commits with git fetch, and you send commits to that Git with git push.

The actual transfer protocol has lots of little subtleties, but it starts out pretty simply: your Git and their Git have a conversation where one tells the other about the hash IDs of the commits it has that it could send, and the other either says: I have that one already or oh, I'll take that one. So, suppose you have:

...--A
      \
       D--...--G--H--I   <-- somebranch (HEAD)

and they have:

       B--C   <-- master (HEAD)
      /
...--A

(the different HEADs are because these are two different Gits, after all). If you connect your Git to theirs with git push origin somebranch, your Git will enumerate your commit hash IDs I, H, and so on back to A. When your Git reaches the hash ID for A, they'll say: ok, stop, I have that one! Your Git will then package up commits D through H and send them over.

At the end, your Git adds a final request: Please set your branch name somebranch to remember hash ID I. Their Git doesn't have a somebranch yet, so they are fine with that and do it and your Git and their Git are done and you disconnect.

You can, either before or after this, also run git fetch origin. Here your Git calls up their Git, but this time their Git is the sender. They will send you commit hash C, saying: My master has this hash, do you have commit C? If you don't, they'll also offer B and then A, which you do have.

If you run git fetch origin master, your Git will only take commits that they have on their master that you don't have at all, but with git fetch origin, your Git will take any commits you don't have from any of their branch names: they'll list all of them, along with the commit hash IDs, and your Git can and will be greedy and ask for them all.

In any case, at the end of this process, there's no separate polite request: your Git just says: OK, I know now that their master identifies commit C, so I'll update my origin/master to make it identify commit C. So you get this in your repository:

       B--C   <-- origin/master
      /
...--A   <-- master
      \
       D--...--G--H--I   <-- somebranch (HEAD)

Your own master is now two commits behind their master, so you can now run:

$ git checkout master

which moves your HEAD to your master:

       B--C   <-- origin/master
      /
...--A   <-- master (HEAD)
      \
       D--...--G--H--I   <-- somebranch

and then run:

git merge origin/master

(or git merge --ff-only origin/master). This notices that the common merge base between your master and your origin/master is commit A—the tip of master—so it performs what Git calls a fast-forward operation, rather than a real merge: it just checks out commit C directly, and moves your master there:

       B--C   <-- master (HEAD), origin/master
      /
...--A
      \
       D--...--G--H--I   <-- somebranch

Note that git fetch updates your origin/* names to match their Git, while git push ends with polite requests of the form: Please set your actual branch names to these specific commits. This means you can always run git fetch: it doesn't affect your branches at all. But you sometimes can't achieve a plain git push as they will sometimes refuse to change their branches. (I'm not going to get into the details here as this is already very long, but this is what git push --force is for: it changes the polite request to a command. They don't have to obey the command, but they probably will.)

About git pull

The git pull command is a bit of an oddball in Git. What it does is run git fetch, then run a second Git command. The second command is usually git merge. You can tell it to use git rebase instead, either on a one-time basis, or generally. I prefer to avoid git pull and run these two commands separately, because that lets me look at what git fetch fetched before I pick any command to run. If you let git pull run the second command, you must pick which second command to use before you know what git fetch actually fetches.

But lots of people use git pull, so you should know what it does. In this case, it can be a trap, as we'll see.

Let's say you squash-merge on GitHub

In order to do the squash-merge on GitHub, you first have to send them all your somebranch commits:

$ git push origin somebranch

They now have:

       B--C   <-- master (HEAD)
      /
...--A
      \
       D--...--G--H--I   <-- somebranch

in their Git. Now you make a pull request using the GitHub web interface and go to the clicky buttons and select squash-and-merge and click it:

       B--C----------J   <-- master (HEAD), origin/master
      /
...--A
      \
       D--...--G--H--I   <-- somebranch

What you need to do now is delete somebranch, both on GitHub and in your own Git repository. If you do that, you're all good: the 30 (or however many) commits that were, and as of right now still are, only on somebranch (in both Gits) are no longer visible. (Side note: GitHub tends to remove their copies immediately, while your Git waits the 30-or-more days.)

But if you, or anyone for that matter, accidentally merge somebranch with master, you now get a new merge commit:

       B--C----------J
      /               \
...--A                 K
      \               /
       D--...--G--H--I

Note that anyone can make this mistake any time after J gets created and before D-through-I are gone / invisible. Since git pull runs git fetch followed by git merge, it's easy to make this mistake using git pull.

Specifically, if you are on your own somebranch and run git pull origin master, you get exactly this error. Fortunately, this one is easy to fix. You now have this:

       B--C----------J    <-- master
      /               \
...--A                 K   <-- somebranch (HEAD)
      \               /
       D--...--G--H--I

You still just need to delete branch somebranch and now commits D through K are invisible and will go away eventually:

$ git checkout master
$ git branch -D somebranch

The checkout is because you can't delete the branch you're on; the -D (uppercase Delete, as it were) is the way to force the deletion, even though git branch would normally complain that you're going to lose commits D-through-K (which is of course the idea).

torek
  • 448,244
  • 59
  • 642
  • 775
1

The author name associated with a Git commit is part of the commit history, and as such, just changing your username via git config will not alter any commits which already exist in the past. It will only affect commits you make in the future moving forward. So, this means that someone searching for commits made under your previous username would have to use that previous username in the search.

This is not recommended, but if you really want to rewrite the history of one or more of your branches to alter the author, then read How to change the commit author for one specific commit?.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360