3

I need to delete commits made 1 year ago because they contain sensitive data that must be removed.

I have used BFG Repo-Cleaner, and I have been able to almost delete everything, but there are some very old commits that are not being removed.

I will try to write an example; The GIT history looks like this

  • C -> secret files do not exist
  • B -> secret files are removed
  • A -> secret files were added

(A being the oldest and C the newest commit)

And this is what I would need (B does not exist anymore, but later commits are not affected):

  • C -> secret files do not exist
  • A -> secret files were added

I'm working in a big team so, unless there is no other option, I would like to avoid using git push -f.

What is the best way to achieve this?

Thank very much.

(edit)

The reason for this is that we have a regular scan on our repo that detected commit A as a vulnerability.

We made commit B, were we deleted all credential and secret files, and the problem is that the scan also detects the commit B as a 'security issue'.

We are asked to remove commit B to pass the scan.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
rogervila
  • 974
  • 1
  • 10
  • 27
  • 1
    If secret files are added in `A`, how are they supposed to not exist in `C` if there is no commit in between where they are removed? Don't you mean: `A` - secret files do not exist, `C` - secret files do not exist? – mkrieger1 Jul 03 '19 at 08:18
  • 5
    In any case, it is not possible to remove or change commits without rewriting all other commits that are successors. This implies that you can't publicly change history without using `push -f`. – mkrieger1 Jul 03 '19 at 08:20
  • Hi @mkrieger. Yes, A is the oldest one and C is the most recent. The problem is that we have a regular scan on our repo that detects the commit B as a 'security issue'. I'm updating the description to make this more clear. – rogervila Jul 03 '19 at 08:24
  • 1
    If you remove the commit `B` where the secret files are removed, then they will still exist in commit `C`. Is your goal to eradicate secret files from the repository, or to silence "security" warnings? – mkrieger1 Jul 03 '19 at 08:30
  • 1
    I think you will definitely have to do an interactive rebase and force push. – Siri Jul 03 '19 at 08:35
  • Well, removing the files completely would be the best solution. The problem is that those files are there since the first commit of the repository. But if there is a way to obfuscate/encrypt/whatever commit B to avoid scan alerts, I think it would be enough. – rogervila Jul 03 '19 at 08:36
  • 1
    No it wouldn't; if you only change commit B, but not A, the files are still in the repository and anyone who has the repository can access them. – mkrieger1 Jul 03 '19 at 08:37
  • Commits *are* the history in a repository, so you've phrased your question as: *How do I alter the history without altering the history?* That is of course impossible. But BFG (or any other process) can do a complete rewrite of all history so that the files are not present in any commit, and that is what you must do. – torek Jul 03 '19 at 16:28
  • Hello. As you all said, it was impossible. I ended up using BFG multiple times until I found all files containing sensitive data. Thank you all for your responses – rogervila Jul 05 '19 at 11:57

1 Answers1

3

TL;DR

  • you must rewrite commit A to not contain the sensitive file in the first place
  • you must use git push -f
  • you're not done yet: you must still clean the history on the server

Rewrite commit A and the whole history

This should be what bfg did for you. I assume you ran something like bfg --delete-files <sensitive-file>. This should have created a whole new history where <sensitive-file> never existing: commits that added or modified it, as well as other files, should be rewritten without that file. Commits that only touched it should disappear, since they would now be empty commits.

So now you have commit A', a copy of A without <sensitive-file>. The rest of the history is rewritten as its successors: C', etc.

To confirm that this happened correctly, run this command in both an old sandbox and the new one updated by bfg:

git log --all <sensitive-file>

You should see the commits touching the sensitive file in the original repo but no output in the new one. This is how you can be confident the file is really removed from the history.

You must use git push -f

The sha1 of a Git commit is a cryptographic signature of a commit, all its meta data (committer, date, comment, etc), all its contents, and all its history.

If you change any one aspect of the commit: the date, the comments, the contents, or any one aspect of any of its ancestors, the cryptographic signature changes, by definition.

So the only way forward is a git push -f.

You're probably not done

But wait, after doing git push -f, the server will still have copies of the old history. See here for GitHub: If you pushed to GitHub, it is too late even if you force push it away one second later. Apparently, the only truly safe way to eradicate the sensitive file from a GitHub repo is to delete it and recreate a new one with only the clean history you want to keep. There are other solutions, but your mileage may vary - details in the linked post.

If you're using a different or private Git server, make sure to force garbage collection and follow further recommendations at Remove sensitive files and their commits from Git history

joanis
  • 10,635
  • 14
  • 30
  • 40
  • Hello. As you all said, it was impossible. I ended up using BFG multiple times until I found all files containing sensitive data. Thank you for your answer! – rogervila Jul 05 '19 at 11:57