See Remove sensitive files and their commits from Git history, but—this is very important—your problem is simpler, because:
If it's of any help, I don't have any remotes, yet. Everything is kept on my local disk.
This is indeed very helpful. What you are going to do—what you must do, no matter which way you choose to do it—is to "rewrite history". History, in Git, is nothing more than the set of commits in the Git repository. Each commit saves a full and complete snapshot of every file,1 plus some metadata like who made the commit (name and email), when (date-and-time-stamp), and why (log message). One part of the metadata specifies which commit is the previous commit: the immediate history for this one commit.
History just means: start at (all of) the last commit(s), and work backwards from each point to its previous (parent) commit(s). That's it—that's all there is to it, really. But, every commit is frozen forever: you cannot change which files it has, nor which parent commit(s) it identifies. So to "change history" you must construct a whole new history, starting from whichever commit(s) have the files you don't want them to have. From then on, every descendant has to change too: to not have the file(s), and/or to list as their immediate history, the commit(s) that don't have the files.
In a big repository with a lot of commits, this tends to amount to: Copy every commit to a new and improved commit. Then you simply switch from using the old commits to using the new ones. The old ones, being un-find-able, are eventually2 cleaned up and really do go away. In the meantime, you just carry around double copies of everything—which, because of the way Git stores files, doesn't really take much extra space.
Next, although I've never actually used The BFG, I recommend considering this answer to the linked question.
Last, no matter which of the various approaches you use from Remove sensitive files and their commits from Git history, I'd recommend that you do it this way:
- Copy your repository (see below for copying methods).
- Apply your chosen "rewrite history" method to the copy.
- Inspect the result. Is it good? If so, switch to using the copy. If not, remove the copy and start again at step 1.
If your chosen method is git filter-branch
, the copy in step 1 is not actually necessary. It just makes it a lot easier for those not super-familiar with Git, because if you didn't modify the original, you can feel pretty safe just removing the attempt. The original is still there, intact.
1Obviously, each commit really only saves a full and complete copy of every file that you saved with that commit. But that's all of your files from the last commit, plus any you added, minus any you explicitly removed.
The reason this doesn't make your repository grow immensely fat nearly instantly is that the frozen, compressed copy of a file in some previous commit can be—and is—reused in any later commit that uses the same data. This is entirely safe because all commits are frozen for all time. At most, the commit itself can be forgotten, and then eventually deleted: if some of its files are still in use by some other commit, the file data remains. The file data only goes away if no commit is using it.
2The "eventual" is based on both hidden references to commits, which are kept in each repository's reflogs, and the background cleaning process. The background cleaner only fires up when it looks, at a quick glance, profitable to do so. You can force a cleaning by running git gc
yourself. The cleaner will find all references—including all hidden ones—to see which commits need to be kept, and which files are used by those to-keep commits. Commits and files and other internal objects that aren't needed any more, and are at least some particular age—14 days old by default—can then be removed for real.
Copying a repository
The simplest method is to use whatever file-tree-duplicator your system has, to copy the entire work-tree including the .git
directory / folder:
cd $HOME/src
cp -r original copy
for instance. That works fine, with Git, although it also copies any random stuff that's not technically part of the repository. Note: If you have used git worktree add
, it doesn't copy the added work-trees that live outside the original/
area, but neither does the other technique I'm about to show.
The other method is to use the fact that every clone of a repository, is a repository. The tricky part here is that clones don't copy a few things:
By default, none of the remote-tracking names of the original repository wind up in the clone. None of the remotes do either, so there's no sense in copying such names. You have no remotes, so this is irrelevant.
By default, the new clone has the original repository as its one and only remote. This remote is named origin
. That's fine, you can remove this origin
later if you want.
By default, the new clone renames all of the branches from the original repository. If the original repository has branches B1
, B2
, B3
, and master
, the new clone has origin/B1
, origin/B2
, origin/B3
, and origin/master
as its remote-tracking names.
A remote-tracking name is just Git's way of remembering: I saw this branch on some other Git! The last time I saw it, it said to use commit _____ (fill in the blank based on what this Git saw from the origin
Git).
So, if you do:
git clone file://$HOME/src/original copy
then your new copy in ./copy
has file://$HOME/src/original
as the URL stored in its origin
, and has renamed your branches from original
to origin/*
in copy
.
The last step of the clone is to git checkout master
, so that the copy now has its own master
, but doesn't have its own B1
, B2
, and B3
. So before you rewrite history in the copy, you'll want to create the branches.
You can do this pretty simply, manually, by just running:
git checkout B1
git checkout B2
git checkout B3
These commands use the same mechanism that git clone
used to make master
in copy
based on copy
's origin/master
that copy
got from origin
(i.e., the original repository). So, now, your copy has five branches, just like your original.
(If you have a lot of branches, and need to do this often, you'll want to script it instead. But if you need to do this often, you're doing something wrong in the first place. :-) )