1

I am going to publicly release a project, hopefully to find contributors. My project is a local clone of another active and well-resourced project (a Django project template). I haven't made any deep changes to the code, although it is now a different project.

My current git history is a mess and not very helpful. I will clean it up somehow before public release and, of course, make clear what the original forked project was. Since I don't think there's anything special or mystifying about the modifications I've made to the project (mainly customisations), I'm keen to squash all commits as per this Stack Overflow post. I wonder if this is bad practice as it may make it even a bit harder to contribute to if I drop the indexed search history. I intend to reduce such problems with good comments, a considered README, etc.

The alternative I see and wish to avoid is a painstaking incremental rebase squash mission.

KindOfGuy
  • 3,081
  • 5
  • 31
  • 47
  • If you forked from another project on GitHub, your own project is visible to the world. Rewriting/deleting history of a public project is considered bad practice. Are you sure nobody cloned or forked it (your project)? – jub0bs Sep 01 '14 at 14:33
  • My modifications are local. – KindOfGuy Sep 01 '14 at 15:53
  • Do you mean that you forked rom the main project, cloned your fork and never pushed to it? – jub0bs Sep 01 '14 at 15:54

1 Answers1

3

I understand that you have

  1. forked from a project on GitHub,
  2. cloned your fork,
  3. made a number of commits in that local repo.

In case you have already pushed to your fork

Because your (untidy) history is now public, some people may have already forked/cloned it to build upon your work. By rewriting your history and then force pushing to your fork, you run the risk of pissing those people off... big time! That's bad practice.

So, before proceeding, you should at least make sure that your project was never forked or cloned. Fortunately, GitHub keeps track of that information. In the right-hand side navigation bar, click on Graphs.

enter image description here

The Network tab will show you how many people forked your project.

enter image description here

If you're the only one listed there, good. Then go to the Traffic tab, to see how many times your project was cloned.

enter image description here

If your project has never been cloned, there is still time to force push your tidied history to your fork.

One caveat: of course, there is always a risk that someone forks/clones your old, untidy history just in the nick of time before you force push. Proceed with care.

In case you have not yet pushed anything to your fork

In that case, rewriting your history is completely safe, and is considered good practice.

All you have to decide is the level detail you wish to retain in the new, tidied history. It's really up to you, but, as you rewrite history, try to put yourself in the shoes of someone browsing that history and trying to make sense of your improvements/changes.

For instance, if the changes you brought to the original project are substantial, squashing all your commits into one massive commit may not be the best idea... Make your history more pedestrian by spreading your changes over several commits, in a logical manner.

Here is a relevant passage of the Pro Git book:

[...] try to make each commit a logically separate changeset. If you can, try to make your changes digestible — don’t code for a whole weekend on five different issues and then submit them all as one massive commit on Monday. Even if you don’t commit during the weekend, use the staging area on Monday to split your work into at least one commit per issue, with a useful message per commit. If some of the changes modify the same file, try to use git add --patch to partially stage files (covered in detail in Chapter 6). The project snapshot at the tip of the branch is identical whether you do one commit or five, as long as all the changes are added at some point, so try to make things easier on your fellow developers when they have to review your changes. This approach also makes it easier to pull out or revert one of the changesets if you need to later.

jub0bs
  • 60,866
  • 25
  • 183
  • 186
  • The second scenario is mine, and my question intends to seek advice on what constitutes 'substantial'. – KindOfGuy Sep 01 '14 at 16:41
  • 1
    @KindOfGuy See my last edit. The bottom line is: it's really up to you. But, as you rewrite history, try to put yourself in the shoes of someone browsing that history and trying to make sense of your improvements/changes; better be a tad pedestrian than dumping all the changes on them in one massive commit that makes changes all over the place. – jub0bs Sep 01 '14 at 16:46
  • 2
    For however little or much it's worth, I'll say there are simply no competitors for @Jubobs's answer here. Start from your latest commit, `git reset $clonedcommit` so you're rewriting your change history from the beginning, use `git add` and `git add --patch` and `git commit` to construct, with the benefit of hindsight, the commit history you would have constructed if you'd had the foresight to do it that way in the first place. The result is the best possible history to base further work on, not just for everyone else but for you, too. – jthill Sep 01 '14 at 17:01
  • 1
    addendum: even if others _have_ fetched the messy version, it's worth considering whether it's better to burden them with the switch now -- particularly, if the tip of your rewritten history has the same tree as the tip of your superseded history, they can trivially graft their own work, all of it, including all breanches, onto the new history with filter-branch. – jthill Sep 01 '14 at 17:07