Any solution you can apply is going to be a history rewrite. That means that it will adversely affect anyone else with a copy of your repo, and if they do the wrong thing when trying to recover, it could undo your fix.
Having this situation in a publicly available repo is therefore a pretty unfortunate situation, though if you happen to know that not many people (or maybe nobody) has cloned it, it may not be too bad in practice. Main point is, communicate what you're doing in a way that all users of the repo can be kept aware.
(Usually I would say that you need agreement/coordination of anyone who has a copy of the repo; here, if you see it as your repo that you're letting others clone, I suppose you could say just a measure of coordination is fine; but unless you're restricting pushes to the origin, the possibility of someone doing the "wrong fix" and re-introducing a bad commit exists whatever we might say is "right".)
Anyway, be aware of the above, but it can't really be helped. You have to rewrite history, and the question is how.
You could just remove all the commits that have been made since you added the node_modules
folder, but of course then you'll lose all the other changes from those commits. The easiest way to get rid of node_modules
without losing other history (and without 3rd party tools) would be git filter-branch
.
Of course you want to make sure you have all refs locally. Since your repo is presumably the true original, which you've replicated to github, it should be ok. But if need be, you could fetch or even do a --mirror
clone of the origin to start things off. Then
git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r node_modules' -- --all
If you have commits that change nothing outside of node_modules
and want to discard those commits, you can add the --prune-empty
option before the --
delimiter.
(On a repo with a large history (many commits), this could be slow; in that case, you might consider a third-party tool like the BFG Repo Cleaner, which is a more specialized tool for removing large/unwanted files from history (as opposed to filter-branch
, which is a much more general-purpose tool).)
After you've run this, and checked that your history looks ok, you will want to do some clean-up of the local repo. Arguably the easiest thing is to use it to create a new clone.
cd ..
git clone file://localhost/path/to/old/repo newrepo
If you'd rather clean up the original local repo, you'll need to remove a set of "backup refs" that filter-branch
created (under refs/original
), and probably wipe out the reflogs, and then use gc
to actually throw out the unwanted objects.
As for the repo on github, again it may be that deleting it and recreating it would be the easiest thing - especially if you have many rewritten branches. Alternately you could force-push (git push -f
) each rewritten branch, and consult the github docs for info about server-side gc