5

I needed to remove a file from my commit history. I followed Github's instructions for removing sensitive data:

$ git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch <myfile>' \
--prune-empty --tag-name-filter cat -- --all

...but I must have done something wrong, because now I have a bunch of duplicate commits. One set of commits still has my file; the other doesn't. Other than that, they're identical. How can I delete all of the commits that still contain my file?

Katrina
  • 409
  • 5
  • 16
  • What makes you say that you have "duplicate" commits? It would be helpful if you showed us exactly how you arrived at that conclusion (i.e. show us some commands and their output). –  May 23 '14 at 02:56
  • If I look at `git log`, I now have twice as many commits as I did before. Each original commit has a corresponding new commit with the same commit message, time, etc. `git diff` shows that the only difference between the two is the presence of the file I tried to delete. – Katrina May 23 '14 at 03:12
  • I don't understand how you could have twice the number of commits in your log. I just tested out the same `git filter-branch` command from the GitHub instructions on my own test repo, and it worked just fine. How many commits were you rewriting originally? Do you have a backup copy of your repo before you did the filter-branch? Were you running `git log --oneline --graph master`? –  May 23 '14 at 05:15
  • I'm sorry for the lack of information; I didn't know what I had done. I believe Helmut has accurately described the root cause - indeed, that's what it looks like when I run `git log --oneline --graph master` - but I'm still not sure how to fix it. I originally had ~100 commits, and sadly I did not have a backup. – Katrina May 23 '14 at 13:37
  • As long as you don't run `git gc` yet, I think your old commits are still recoverable, if that is something that you need to do, to start over (I'll add an answer for it). In the future, however, I highly recommend that you make a backup clone of your repo before doing something with `git filter-branch`, in situations like these where something appears to have gone very wrong, and you need go back to your original state. –  May 23 '14 at 17:29

2 Answers2

7

Presumably you applied filter-branch and then pulled from a remote. When you git filter-branch, you indeed duplicate your history. You create a new set of commits that are identical to the old ones except for the changes applied (a removed file in your case). All (or most) commit ids will have changed. Still your change only happened locally. The remote still has the original commits. Now you probably tried to push your commits, but it said something about "detached heads" or your history "diverged". The usual thing to do then is pull. By doing so, you picked up the original commits and merged them into your rewritten history. Rather than pulling you should have done a forced push to destructively overwrite the history of your remote, something that git refuses to do without -f for a good reason.

Helmut Grohne
  • 6,578
  • 2
  • 31
  • 67
  • I think you're assuming too much, given how little information the original poster has given us about what she has actually done. –  May 23 '14 at 09:17
  • Thanks, @Helmut - this sounds like what probably happened. I assumed that by pulling and then merging my remote, I'd merge the remote's old commits with my re-written ones. Is there still a way to quickly rewrite things, or do I need to remove my commits manually and then force push to fix the remote? – Katrina May 23 '14 at 12:32
  • @Katrina [the GitHub instructions](https://help.github.com/articles/remove-sensitive-data) say to force push immediately after confirming that the filter-branch removed your file, it doesn't say to pull right after. It doesn't make sense to merge the old remote commits, because those still contain the file that you were trying to remove in the first place. What you really want to do is to overwrite those old remote commits with the new ones, which is exactly what the GitHub instructions tell you to do. –  May 23 '14 at 17:31
7

Given the information from the question, the existing answers, and their comments, it appears that the original poster made a few mistakes after doing the git filter-branch, and didn't make a backup clone of the repo.

So here are instructions for returning the repo back to its previous state before the filter-branch, if that is something that the original poster wants to do.

Original references

git filter-branch will automatically save references to your old commits, in case you need to recover them for any reason. You'll find them under your repo's .git/refs/original/refs/ directory:

ls -l .git/refs/original/refs/heads/
total 1
-rw-r--r--    1 Keoki    Administ       41 May 23 01:13 master

ls -l .git/refs/original/refs/tags/
total 1
-rw-r--r--    1 Keoki    Administ       41 May 23 01:13 v1.0

Each of the above references contains the commit sha ID of your old commits:

cat .git/refs/original/refs/heads/master
276fc24dc4b12edf75aea40f4fd50e25a5840005

cat .git/refs/original/refs/tags/v1.0
475593a612141506f59a141e38b8c6a3a2917f85

Use hard resets to recover

To get back your original master branch (from before you did the filter-branch), just do a hard reset using the references above, or use the commit sha ID contained in them:

git checkout master

# Use reference
git reset --hard refs/original/refs/heads/master

# Or use sha ID
git reset --hard 276fc24dc4b12edf75aea40f4fd50e25a5840005
  • Thank you - this was really helpful, and enabled me to undo the mess that I made. I appreciate your patience, I know that getting the right information out of me was like pulling teeth. – Katrina May 23 '14 at 21:01
  • @Katrina meh, don't sweat it, not that big a deal. Now that you've been able to recover your original commits, do you understand how to do the filter-branch and force push now? Also, before you go doing another filter-branch again, I recommend to just make a quick backup clone with `git clone --bare "backup"`, which is a generally good idea when you're doing destructive operations like with filter-branch. –  May 23 '14 at 21:04