Rewriting History to Remove Duplicate Commit Messages

Question

When I first got into VC, I was using SVN and didn't understand what I was doing. I maintained different pieces of one project on separate trunks, but would make commits to all of the trunks at once, resulting in multiple commits with identical commit messages. About two years ago I woke up and smushed all of the trunks into a single trunk and then woke up some more and converted the repo to git. Now I have a fast, flexible repo with a handful of branches and I couldn't be happier...

... except all of those old, duplicate commit messages are bugging me. (They make up about 1/3-1/2 of the commits in my repo.) This is exactly what git rebase is for, right? I've tried a test run on one batch of duplicates and it worked except it looked like it separated my master branch from all of the rest of the my branches. I did that with git rebase -i <sha> and squashed all of the duplicate commits into the first. I would like to keep my branch structure intact.

All of the duplicate messages that I would like to squash came before I did the svn=>git conversion, and all of my branches started after that conversion. Which is to say the entire history before the svn=>git conversion is linear with no branches.

Another caveat -- and it's a big one -- is that this repo has been pushed to a remote repo. I know rewriting history for shared repos is bad news, but I'm only considering it because noone has yet to clone or fork from my remote. I would like to clean up the history before I make it available for cloning/forking.

So is there a way to rewrite history up to a point and leave the rest untouched? Any other suggestions to help me clean up this mess?

score 3 · Accepted Answer · edited May 23 '17 at 12:08

You can use rebase to squash two old commits into a single one, but this will result in a completely new commit. Therefore the rebase has to change all the childs (commits directly following the commit) to point to the new commit, which will again result in a completely new commit. - and so on...

Modifying/rebasing old commits will therefore lead to a completely new commit graph. Other branches still point to the old graph. That's why you separated your branch from all of the other branches.

Depending on the complexity of your commit branch your clean up gets quite tricky. Most probably you should just leave you repository as it is.

If you really want to change history. You should create a new branch pointing on your last SVN commit, which should be a parent of all your branches. git tag oldsvn $SHA1; git checkout -b newsvn oldsvn

You can now clean up that branch, and afterwards rebase all your other branches on that branch. (That is the real idea of rebase, to give a branch a new base.) You can do it with git rebase --onto newsvn oldsvn $branch.

As you are probably rewriting multiple branches which share common history you probably get different trees for your branches again. Have a look at Git: How to rebase many branches (with the same base commit) at once?

Ewww. It would seem that leaving well enough alone is the way to go here. Thanks. — Clayton, Mar 14 '13 at 20:44

Rewriting History to Remove Duplicate Commit Messages

1 Answers1