4

We have a git repository since 8/13/2013, with over 4000 commits, occupying almost 7 GB of disk space. (GIT Version: 2.9.0.windows.1)

During these years the project evolved pretty much, so the oldest commits are no longer useful.

As many other, we'd like to "consolidate" the history from a certain date. Let's say we want to "squash" together anything older than 6 months, to become a single big commit.

The main handicap is that we got a multi-branches structure, and obviously we want to preserve it:

  • Master branch (perpetual)
  • Develop branch (perpetual)
  • Feature branches (one for each task, deleted after merging)

In example, This is how the history looks now:

How the history looks now

This is what we need:

What we need

We tried several approaches such "Rebase", "Cherry pick", "clone" with "depth"... but nothing seems capable to do what we need. These are the most meaningful things I tried:

  • Rebase and Cherry pick (using tortoiseGit 2.1.0.0) With both commands I tried to "squash" the oldest commits, but each merge results in a dialogue "which parent do you want to pick? parent1/parent2", then no matter which I pick: all files get marked as "conflict" and so they need to be resolved "manually". I just can't handle all this conflicts manually (nor reproduce the same identical sequence for Master and Develop branches).

  • Clone with depth (via Git-Bash) I executed this command: "git clone limitedRepo --depth=1000" that correctly "squash" all older commits, but the resulting repo has only a single branch.

So I tried this command to get back Develop branch from origin:

"git remote set-branches origin '*'" "git fetch -vvv"

but the fetched branch contains the whole history, not the "squashed" we need.

I tried to use the same commands with different parameters, but I'm just groping.

Any idea?

Il Sui
  • 41
  • 3
  • I just made another test, using rebase, but I still have conflict problem. This is what I tried: 1. git checkout --orphan temp sha1 2. git commit -m "Truncated history" 3. git rebase --onto temp sha1 master This is the message I got: CONFLICT (content): Merge conflict in aFile.txt error: Failed to merge in the changes. Patch failed at 0001 Built for Release The copy of the patch that failed is found in: .git/rebase-apply/patch When you have resolved this problem, run "git rebase --continue". – Il Sui Sep 06 '16 at 06:57

1 Answers1

0

Maybe it's not the number of commits occupying disk space, rather perhaps it's multiple versions of large files that exist in your repository history, but have since been removed from the current version of your code. Pro Git has a section called Removing Objects that allows you to remove large files from your Git history.

There are a lot of great things about Git, but one feature that can cause issues is the fact that a git clone downloads the entire history of the project, including every version of every file. This is fine if the whole thing is source code, because Git is highly optimized to compress that data efficiently. However, if someone at any point in the history of your project added a single huge file, every clone for all time will be forced to download that large file, even if it was removed from the project in the very next commit. Because it’s reachable from the history, it will always be there.

(emphasis, mine)

... Be warned: this technique is destructive to your commit history. It rewrites every commit object since the earliest tree you have to modify to remove a large file reference. If you do this immediately after an import, before anyone has started to base work on the commit, you’re fine – otherwise, you have to notify all contributors that they must rebase their work onto your new commits.

Now all you need to do is find large files in your repository history.

Related StackOverflow post: Remove old commit information from a git repository to save space

No matter how you do this, a team wide git rebase is in your future.

Community
  • 1
  • 1
Greg Burghardt
  • 17,900
  • 9
  • 49
  • 92
  • Hi, first of all thank you. Yes, we do have some large files, (not too large), but we need to keep them as part of the repository. I've already tried the instructions of http://stackoverflow.com/questions/12865332/remove-old-commit-information-from-a-git-repository-to-save-space but as I told in original comment: I tried to "squash" the oldest commits, but each merge results in a dialogue "which parent do you want to pick? parent1/parent2", then no matter which I pick: all files get marked as "conflict" and so they need to be resolved "manually". How can we solve this? – Il Sui Sep 01 '16 at 15:17
  • @IlSui: Have you looked at [Resolving a Git conflict with binary files](http://stackoverflow.com/questions/278081/resolving-a-git-conflict-with-binary-files)? Maybe you aren't merging the large files correctly so Git recognizes the conflict as resolved? – Greg Burghardt Sep 01 '16 at 15:34
  • Large files has nothing to do with it. The problem is the same for any file, no matter the size: When you squash 2 commits together you simply have to chose which file version you want to keep and which one you want to discard... for each common file. – Il Sui Sep 05 '16 at 09:58