2

I made a dumb error and accidentally committed a node_modules folder to my local git AND then pushed it to github. This is a huge folder and any else who downloads my repo will have also download this folder in the old commits. I've been trying to remove the commits with rebase --onto and rebase -i without luck. This is what my git log looks like.

$ git log --oneline
44549c5f (HEAD -> alex/matUI, origin/alex/matUI) fighting with gitignore
a5a5a79c changed ui to material   ##<---- remove me!
dbec4ab3 converting to material ui      ##<---- remove me!
cd4352f6 (origin/master, origin/HEAD, master) Merge pull request #1 from notsmart/addFullstack
a058bf1e moved files to new repo
80c82607 Added README.md

How would you remove these commits?

honkskillet
  • 3,007
  • 6
  • 31
  • 47
  • Possible duplicate of [Rolling back local and remote git repository by 1 commit](https://stackoverflow.com/questions/4647301/rolling-back-local-and-remote-git-repository-by-1-commit) – phd May 08 '18 at 13:13
  • Duplicate of https://stackoverflow.com/questions/34746245/delete-and-completely-remove-the-commit-from-git-history – k0pernikus May 08 '18 at 14:09
  • Possible duplicate of [Delete and completely remove the commit from git history](https://stackoverflow.com/questions/34746245/delete-and-completely-remove-the-commit-from-git-history) – k0pernikus May 08 '18 at 14:10

2 Answers2

3

You have to do two things:

  1. remove those commits locally
  2. push them with force to overwrite the branch on origin

Edit: actually back up those files that will be removed first, because this method will remove them from your filesystem.

First:

git rebase -i HEAD~4

Now you have an open editor with lines similar to what you wrote. Remove the lines with commits you don't want. Save and exit the editor.

Check git log if it's correct.

Then:

git push -f

Explanation:

First you started an interactive history edit session. You have possible options below in the editor, commented out. You can do many things like removing commits by removing lines, squashing them together, reordering by just reordering lines etc.

Then you removed the commit lines and saved. What happened is git tried to create new chain of commits to apply your desired changes. Actually new commits were created (part of the commit is link to previous one), so there are new hashes for commits that have been changed (because technically they are new ones). You will see that origin/alex/matUI is no longer on your HEAD (in git log).

Finally you pushed with force. This overwrote origin/alex/matUI with your current alex/matUI. This actually overwrites any branch that your HEAD is pointing on and is tied with a branch on origin (your alex/matUI is tied with origin/alex/matUI, that's no magic, it's an explicit tie that you either create manually or have it created when pulling/cloning). Normally push is conservative, allows only additions after tips of your branches. -f forces through that. Use the force Luke :)

Ctrl-C
  • 4,132
  • 1
  • 26
  • 29
  • You may also want to run `git gc` (garbage collection) which should remove unreachable objects from your git repository, so the .git folder will not contain those commits anymore. I'm not 100% sure that it will remove those commits that you've just removed. I think so. Anyway github should run such garbage collection itself on the server, so no need to worry about it being on their server. – Ctrl-C May 08 '18 at 13:14
  • 1
    No, `gc` will not generally remove the commits you just took out of history, because they're still reachable by the reflog. As for the server, can you refer to docs that confirm how `gc` is used in github, or is that just how you assume it "should" work? Because it's been a long time since I looked at it on that service specifically, but I don't recall it being as nice as that. – Mark Adelsberger May 08 '18 at 13:24
  • Thanks for the clarification! I don't know it for sure (this is why I didn't add it to the answer), but looks to me like a perfectly reasonable thing to do to save space with little computation cost (possibly only after x operations). For example GitLab does nightly cleanups, at least selfhosted instances. Once a submodule revision disappeared in the night because it was an unattached commit (was amended). – Ctrl-C May 08 '18 at 14:52
1

Any solution you can apply is going to be a history rewrite. That means that it will adversely affect anyone else with a copy of your repo, and if they do the wrong thing when trying to recover, it could undo your fix.

Having this situation in a publicly available repo is therefore a pretty unfortunate situation, though if you happen to know that not many people (or maybe nobody) has cloned it, it may not be too bad in practice. Main point is, communicate what you're doing in a way that all users of the repo can be kept aware.

(Usually I would say that you need agreement/coordination of anyone who has a copy of the repo; here, if you see it as your repo that you're letting others clone, I suppose you could say just a measure of coordination is fine; but unless you're restricting pushes to the origin, the possibility of someone doing the "wrong fix" and re-introducing a bad commit exists whatever we might say is "right".)

Anyway, be aware of the above, but it can't really be helped. You have to rewrite history, and the question is how.

You could just remove all the commits that have been made since you added the node_modules folder, but of course then you'll lose all the other changes from those commits. The easiest way to get rid of node_modules without losing other history (and without 3rd party tools) would be git filter-branch.

Of course you want to make sure you have all refs locally. Since your repo is presumably the true original, which you've replicated to github, it should be ok. But if need be, you could fetch or even do a --mirror clone of the origin to start things off. Then

git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r node_modules' -- --all

If you have commits that change nothing outside of node_modules and want to discard those commits, you can add the --prune-empty option before the -- delimiter.

(On a repo with a large history (many commits), this could be slow; in that case, you might consider a third-party tool like the BFG Repo Cleaner, which is a more specialized tool for removing large/unwanted files from history (as opposed to filter-branch, which is a much more general-purpose tool).)

After you've run this, and checked that your history looks ok, you will want to do some clean-up of the local repo. Arguably the easiest thing is to use it to create a new clone.

cd ..
git clone file://localhost/path/to/old/repo newrepo

If you'd rather clean up the original local repo, you'll need to remove a set of "backup refs" that filter-branch created (under refs/original), and probably wipe out the reflogs, and then use gc to actually throw out the unwanted objects.

As for the repo on github, again it may be that deleting it and recreating it would be the easiest thing - especially if you have many rewritten branches. Alternately you could force-push (git push -f) each rewritten branch, and consult the github docs for info about server-side gc

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52