TL;DR
As Tim Biegeleisen noted, the author and committer of a Git commit is part of the commit itself, and the existing wrongly-addressed commits won't change. You can, however, stop using them, and that's what git merge --squash
enables. Hence the answer is both: yes, this can fix the problem and no, this won't fix the problem, depending on how you look at it and what you do during and after the point at which you make the git merge --squash
commit.
Basically, after the squash-merge, you'll just delete the branch on which you made these 30 commits. You might have to delete it from two places, though: your Git on your machine, and the Git over on GitHub (which is, presumably, sort of yours, but also sort of not—after all, GitHub are the ones controlling it).
Long
You mentioned:
using github.com from their UI
When you work with Git from the command line (on Windows or Linux or MacOS or whatever), you generally start by cloning a repository from somewhere:
$ git clone https://github.com/path/to/repo.git
$ cd repo
for instance (though I prefer ssh://git@github.com/...
URLs). There are now two Git repositories: one over on GitHub, and one locally on your own machine.
Squashing, fetching, pushing, and multiple Git repositories in general
There are two things to know, in general, here: how git merge --squash
works, and how these two different Gits communicate and share commits. In particular, there are two parts to this, no matter how you go about it: you're going to first make one new commit, then you're going to use git push
or git fetch
to transfer the new commit (and maybe old commits, or not) between the two separate Gits.
Any commits you make on your machine using your Git use the settings you've made on your computer—but if you use the web interface over on GitHub, and click various buttons there, you're using their machines to make commits in their Git repository, which will use the settings they have stored for you. Hence if you go to the button that has a side dropdown arrow that lets you select between merge, rebase-and-merge, and squash-and-merge, select any of those three, and click it, their Git makes a new commit in their Git repository and all the settings you've set in your own Git on your computer are irrelevant.
You can do the git merge --squash
on your own machine, and then use git push
, so that you make the commits locally, and only then send them to other Git. Whether and when to do that is up to you.
Some background about making any new commit
You start by picking some branch name and some tip commit to be on, either by explicitly running:
git checkout somebranch
or implicitly by the fact that you did that at some point earlier, so in any case, in this one particular repository, you're now on the branch somebranch
, which by definition has some latest, or tip, commit:
...--o--o--o <-- somebranch (HEAD)
Each commit has some big ugly hash ID that means that particular commit: any Git repository can immediately tell whether it has that commit, and if it does have that commit by that hash ID, it has that exact commit. All Gits everywhere in the universe agree: a commit with those contents—that snapshot, that user name and email address, those date stamps, that log message, and so on—will have that hash ID, and no other commit will ever have that hash ID.
The name somebranch
holds that commit's hash ID inside it, and you can do:
git rev-parse somebranch
to see that hash ID. You'll also see that hash ID, or an abbreviated version of it, in git log
and git log --oneline
output. That hash ID, stored in that branch name, is how Git knows that your current branch somebranch
ends with that particular commit.
Inside that commit is the hash ID of its immediate predecessor or parent commit. That's how Git can go from the tip commit, to the commit one step earlier. Inside the parent is the hash ID of the current commit's grandparent, so from the parent, Git can walk to the grandparent. The grandparent has another hash ID, so Git can walk to the great-grandparent, and so on. This chain of hash IDs allows Git to find all the commits, starting from the end and working backwards.
Hence if we have some chain of commits ending at, say, commit with hash H
:
... <-grandparent <-parent <-H <-- branch-name
and we go to make a new commit, the way Git actually makes the new commit is to package up the commit snapshot, complete with author name and log message and so on, set its parent hash to H
, and write it out. Because the contents of this commit are unique—the parent hash H
, together with a time-stamp Git adds that tells which exact second it was when you make the new commit, make sure it will be unique—the new commit gets a new unique hash ID. Let's call that I
:
... <-parent <-H <-I
Now all Git has to do is write the hash ID of new commit I
into the branch name, so that commit I
is the tip.
In other words, Git starts at the end of each branch, and works backwards. So this kind of chain allows Git to find all the commits that are contained in this branch. Some commits may only be contained in some other branch:
B--C <-- master
/
...--A
\
D--...--G--H--I <-- somebranch (HEAD)
Here, there are a bunch of commits—say, somewhere around 30 commits—that are only on somebranch
, with two commits that are only on master
and many commits that are on both branches. (If you count D-E-F-G-H-I-J
, there's only room for 7 commits; well, let's just pretend there are a bunch of new letters somewhere between D
and H
:-) ). Commits D
through H
have the wrong author name while commits I
and J
have the right one. Our plan now is to stop using all of D
through H
entirely.
Using git merge --squash
Let's look at precisely how git merge --squash
really works. Essentially, it does the same thing that a regular merge would, except right at the end. A regular merge starts out by finding which commit is the best shared commit, i.e., the best commit that's on both branches. This best-shared-commit is the merge base of the two branches.
We check out one branch, then name the other in our git merge --squash
command:
$ git checkout master # note: this attaches HEAD to master
$ git merge --squash somebranch
Git examines the chains of commits that make up the history that's in this repository, to find the best shared commit. In our case, with the history of master
leading back to commit A
and the history of somebranch
also leading back to to commit A
, commit A
is the best shared commit. (All commits before A
are shared as well: they're just not as good as A
.)
Next, Git compares the snapshot in the merge base—in commit A
—to that in our current branch tip commit C
:
git diff --find-renames <hash-of-A> <hash-of-C> # what we changed in master
It does the same to find out what "they" (really, we) changed in the other branch:
git diff --find-renames <hash-of-A> <hash-of-I> # what they changed
Then Git does its best to combine these two sets of changes, applying the combined changes to the snapshot from commit A
. If all goes well, this combination is the snapshot to use. If it does not go well, Git will stop with a merge conflict, and get help from the user to fix it up and complete it.
(More precisely, your Git will do that. The web-based Git on GitHub will just say sorry, I can't do that and not even let you start the merge at all—they check up front before making the merge button clickable. It's a little different from an ordinary Git since it has no way to let you interact with it to resolve conflicts. The GitHub folks do have a desktop client, but I've never used it; I don't know if it's a wrapper around command-line Git, or something fancier.)
We'll assume here that either all goes well, or you have fixed all the conflicts and run git merge --continue
or git commit
to finish the merge. Git now makes a commit that has, instead of the usual one parent, two parents:
B--C--------------J <-- master (HEAD)
/ /
...--A /
\ /
D--...--G--H--I <-- somebranch
The new commit goes onto branch master
because HEAD
is attached to master
(due to our git checkout master
above). New commit J
points back to commit C
, the commit that was the tip of master
; but it also points back to commit I
, the commit that is still the tip of somebranch
. Commits C
and I
are now both on master
, because of this extra, second parent that leads back to commit I
.
When you use git merge --squash
, the key difference is that new commit J
does not have a second parent:
B--C---------------J <-- master (HEAD)
/
...--A
\
D--...--G--H--I <-- somebranch
Everything else is exactly the same: the snapshot for J
, the author and committer (set by whichever Git actually makes commit J
), the timestamp (likewise), and the parent hash ID of commit C
. But there's no second parent for new commit J
.
If you want commits D
through I
to vanish, you now simply delete the branch somebranch
. The name somebranch
is how your Git finds these commits: it starts by fishing the hash ID of I
out of the name somebranch
, then uses I
to find H
, H
to find G
, and so on, all down the line until you reach A
. The name somebranch
will let you reach A
, but so will the name master
. So deleting the name somebranch
makes commits D
through I
vanish from view. Eventually—typically some time after 30 days or so—your Git will garbage collect these commits.
But again, you mentioned "using [the] github.com ... UI". If you are going to do that, you have to send commits D
through I
to GitHub first, or maybe you have already sent them.
Transferring commits with git fetch
and git push
You can, at any time, have one Git connect itself to another Git and either get (fetch
) or send (push
) commits from or to the other Git. From your own computer's Git, then, you obtain the GitHub-Git's commits with git fetch
, and you send commits to that Git with git push
.
The actual transfer protocol has lots of little subtleties, but it starts out pretty simply: your Git and their Git have a conversation where one tells the other about the hash IDs of the commits it has that it could send, and the other either says: I have that one already or oh, I'll take that one. So, suppose you have:
...--A
\
D--...--G--H--I <-- somebranch (HEAD)
and they have:
B--C <-- master (HEAD)
/
...--A
(the different HEAD
s are because these are two different Gits, after all). If you connect your Git to theirs with git push origin somebranch
, your Git will enumerate your commit hash IDs I
, H
, and so on back to A
. When your Git reaches the hash ID for A
, they'll say: ok, stop, I have that one! Your Git will then package up commits D
through H
and send them over.
At the end, your Git adds a final request: Please set your branch name somebranch
to remember hash ID I
. Their Git doesn't have a somebranch
yet, so they are fine with that and do it and your Git and their Git are done and you disconnect.
You can, either before or after this, also run git fetch origin
. Here your Git calls up their Git, but this time their Git is the sender. They will send you commit hash C
, saying: My master
has this hash, do you have commit C
? If you don't, they'll also offer B
and then A
, which you do have.
If you run git fetch origin master
, your Git will only take commits that they have on their master
that you don't have at all, but with git fetch origin
, your Git will take any commits you don't have from any of their branch names: they'll list all of them, along with the commit hash IDs, and your Git can and will be greedy and ask for them all.
In any case, at the end of this process, there's no separate polite request: your Git just says: OK, I know now that their master
identifies commit C
, so I'll update my origin/master
to make it identify commit C
. So you get this in your repository:
B--C <-- origin/master
/
...--A <-- master
\
D--...--G--H--I <-- somebranch (HEAD)
Your own master
is now two commits behind their master
, so you can now run:
$ git checkout master
which moves your HEAD
to your master
:
B--C <-- origin/master
/
...--A <-- master (HEAD)
\
D--...--G--H--I <-- somebranch
and then run:
git merge origin/master
(or git merge --ff-only origin/master
). This notices that the common merge base between your master
and your origin/master
is commit A
—the tip of master
—so it performs what Git calls a fast-forward operation, rather than a real merge: it just checks out commit C
directly, and moves your master
there:
B--C <-- master (HEAD), origin/master
/
...--A
\
D--...--G--H--I <-- somebranch
Note that git fetch
updates your origin/*
names to match their Git, while git push
ends with polite requests of the form: Please set your actual branch names to these specific commits. This means you can always run git fetch
: it doesn't affect your branches at all. But you sometimes can't achieve a plain git push
as they will sometimes refuse to change their branches. (I'm not going to get into the details here as this is already very long, but this is what git push --force
is for: it changes the polite request to a command. They don't have to obey the command, but they probably will.)
About git pull
The git pull
command is a bit of an oddball in Git. What it does is run git fetch
, then run a second Git command. The second command is usually git merge
. You can tell it to use git rebase
instead, either on a one-time basis, or generally. I prefer to avoid git pull
and run these two commands separately, because that lets me look at what git fetch
fetched before I pick any command to run. If you let git pull
run the second command, you must pick which second command to use before you know what git fetch
actually fetches.
But lots of people use git pull
, so you should know what it does. In this case, it can be a trap, as we'll see.
Let's say you squash-merge on GitHub
In order to do the squash-merge on GitHub, you first have to send them all your somebranch
commits:
$ git push origin somebranch
They now have:
B--C <-- master (HEAD)
/
...--A
\
D--...--G--H--I <-- somebranch
in their Git. Now you make a pull request using the GitHub web interface and go to the clicky buttons and select squash-and-merge and click it:
B--C----------J <-- master (HEAD), origin/master
/
...--A
\
D--...--G--H--I <-- somebranch
What you need to do now is delete somebranch
, both on GitHub and in your own Git repository. If you do that, you're all good: the 30 (or however many) commits that were, and as of right now still are, only on somebranch
(in both Gits) are no longer visible. (Side note: GitHub tends to remove their copies immediately, while your Git waits the 30-or-more days.)
But if you, or anyone for that matter, accidentally merge somebranch
with master
, you now get a new merge commit:
B--C----------J
/ \
...--A K
\ /
D--...--G--H--I
Note that anyone can make this mistake any time after J
gets created and before D
-through-I
are gone / invisible. Since git pull
runs git fetch
followed by git merge
, it's easy to make this mistake using git pull
.
Specifically, if you are on your own somebranch
and run git pull origin master
, you get exactly this error. Fortunately, this one is easy to fix. You now have this:
B--C----------J <-- master
/ \
...--A K <-- somebranch (HEAD)
\ /
D--...--G--H--I
You still just need to delete branch somebranch
and now commits D
through K
are invisible and will go away eventually:
$ git checkout master
$ git branch -D somebranch
The checkout
is because you can't delete the branch you're on; the -D
(uppercase Delete, as it were) is the way to force the deletion, even though git branch
would normally complain that you're going to lose commits D
-through-K
(which is of course the idea).