How to truncate or reduce a git repo that's on GitHub

Question

I know various permutations of this questions are floating around, but I haven't been able to uncover anything that addresses my specific issue. The thing is this:

I've got a repo hosted on GitHub. It's the origin for two remote repos - one is my dev machine and the other is the server. I made a stupid mistake and had a script commit incremental user DB backups over the course of like a year and a half. So now I've got about 200mb of files and another 1Gb of incremental db changes committed in my git repo (yes, I learned my lesson). Visually, it looks like this, where "C" indicates a legitimate code change and "DB" means it's a commit containing only an unimportant DB backup:

C1--C2--C3--C4--DB--DB--DB--DB--DB--DB--DB--DB...(1.5 years)...DB--DB...

What I want to do is this:

                  /--DB--DB--DB--DB--DB...<--(throw all this away forever)
                 /
C1--C2--C3--C4--//<--REVERT TO THIS POINT --C5--C6--C7....

I'd basically create a branch containing all of those stupid DB commits, back my repo up to the point where the branch departs, then delete the branch. Any ideas about how to do this? Ideally, I wouldn't have to create a new GitHub repo, but I'll accept suggestions of any nature.

That isn't how branches work, the act of deleting a branch will not delete the commits on that branch. — user229044, Jan 22 '15 at 04:28
possible duplicate of [Completely remove file from all Git repository commit history](http://stackoverflow.com/questions/307828/completely-remove-file-from-all-git-repository-commit-history) — user229044, Jan 22 '15 at 04:28
So you know how to fix your history and remove the wrong commits, and you know how to propagate these changes and your question is mostly about reclaiming wasted space on github, right? — Mykola Gurov, Jan 22 '15 at 07:36
@Mykola Gurov not exactly. Everything I've read about removing wrong commits pertains to the most recent commit, and says "you're out of luck if you've already propagated the changes to another repo". Maybe I'm a little bit thick, but it's been a little hard to put all the fragments of knowledge that I've found together into a workable solution. I'm definitely not an expert with Git, and would like the advice of someone who is. — kael, Jan 22 '15 at 13:48
Depends on whether you can or not rewrite history. If you don't have other people actively developing in branches based on those affected with `DB` you can simply do a rebase, leave unneeded commits out and push with force option corrected branch (master) to the server. — Mykola Gurov, Jan 22 '15 at 13:55

score 2 · Accepted Answer · answered Nov 01 '17 at 16:21

find the commit you want to get back to: $ git log --before="2015-12-01" -n1 commit de4406f26ce506944b2b629890bba9e091468e05 Author: some Author<foo@bar.com> Date: Mon Nov 30 10:46:21 2015 +0100
reset your (local) repository pointer to it:

git reset --hard <commit-hash>
force² push this to your server (² as you have to overwrite history)

git push -f origin master
The subsequent DB commits will be pruned upon next occasion, or you do it right away (pruning has no effect on your already achieved, desired cleanup. It's only about truly getting rid of that unreferenced stuff)

git prune.

If you want to play safe, I would advise

you to create a branch named backup before step 1 (without checking it out, just to point to your old tip!)
after step 3 make sure, everything on your master branch is to your liking. Then delete that backup branch, then go for the prune.

in short:

It helps, to think of branches not as a whole line of commits, but rather as the ending tips, that keep the chains alive leading to it.
Those chain members get garbage collected, once they become unreferenced.**

How to truncate or reduce a git repo that's on GitHub

1 Answers1