The answer to both questions is yes: mixing the old (pre-cleaning) repository with the new (post-cleaning) repository results in the union of the two repositories. However, for question 2, a straight git push
without doing a git fetch
first (or a pull of any sort, which runs fetch as its first step), whoever is doing the git push
will see a failure with a complaint from the receiving repository that the push is not a fast-forward. They will have to override this failure using the +
or --force
flag.
The git pull
may or may not fail with a complaint about unrelated histories, depending on which copied commits (see the description below) wound up re-using the original commits. This also depends on specific Git versions, as older Gits would attempt to git merge
unrelated histories without requiring the --allow-unrelated-histories
option.
(Would I have to just compare the commit hashes of the parent of that commit? My understanding is that cleaning the main repo whether by BFG or git filter-branch changes all of the hashes of all commits in the repo)
This is sort of correct, but wrong in some important details.
Filtering (by any means) is actually the process of copying commits. We take all the original commits, with their parent hash IDs and trees and stored blobs, and copy each commit to a new commit. The new commit will have the filter(s) applied: we remove any blobs we want gone, or make any other changes we desire to the tree and/or to the commit metadata (user names, time stamps, messages, and so on). The first result is another tree, reusing some existing tree hash ID if we made no changes to the tree, or a new tree ID if the new tree is different from every existing tree. We put this old-or-new tree ID in our old-or-new commit metadata using the updated parent hash. The updated parent hash is new if any change has happened to any predecessor commit, or the same if not. Then we make the new commit: if it's 100% identical to the original commit, we get the original hash ID back, otherwise we get a new hash ID.
What this means is that as long as the new copies are 100% bit-for-bit identical to the originals, you are really just re-using the originals. But as soon as some commit is changed, even a tiny bit, the new copy is a different commit, and all of its children now have a different parent hash and are themselves also different commits.
The end effect is that after filtering a repository with git filter-branch
, you generally have a doubled repository, minus whatever amount Git was able to re-use existing commits. The original branch heads are now find-able only through the refs/original/
namespace. If you used a --tag-name-filter cat
, all tags are updated to use the new commits as well, so removing all the refs/original/
references eliminates the original commits.
The BFG avoids all this by rewriting the original references without keeping backups in refs/original/
(and is of course much faster and more convenient than git filter-branch
). Nonetheless, it's still effectively copying all the original commits to new ones. Your copied repository is, in effect, a mostly-new repository, which should never be mixed with the old one.
Of course, if someone has some commits they want to bring from their own repository that is based on the old one, that person will have to mix the old and new repositories in some way. It's up to whoever does this mixing to be careful and to be certain not to reintroduce all the old commits.
For many users, under most circumstances, it suffices for them to treat the filtered repository as an entirely new project, cloning it anew and discarding their previous repository. Only those with commits to transplant need to understand all of the above.