I was trying to remove some sensitive info from some old commits on our company's Git repo using the techniques described on this GitHub help page. Using filter-branch, I was able to modify the repo's history to my liking.
Unfortunately, I made the mistake of doing a pull
from origin and doing some further work on the repo. By doing this, I believe I've effectively merged the original 'tainted' repo (A) with my 'fixed' repo (B), since the number of commit objects has doubled from 3000 to 6000.
Now, I could run the filter-branch steps again and force-push to fix up what I have, but the repo is still 'bloated' to double its size.
I know roughly where the merge occurred, but not the precise commit. I would like to be able to identify and prove which commit is the culprit, and then permanently remove commit tree A. I have a few potential ideas about how it could be done...
- modifying that specific commit that joins A with B and then running a prune to garbage-collect everything under it
- by deleting that commit entirely from history and replicating it later, after a prune
- rebasing to the last commit on the head of repo B and cherry-picking everything above it except the one where I merged with A (not sure if cherry-picking would pull the whole commit tree back in, though!)
I welcome all suggestions!