1

I was trying to remove a couple of lines from a file in my history containing a secret API token and some endpoint that are way better passed as environment variables than hardcoded there, so if I make the repo public someday they won't be there for the indiscreet eyes.

I've used for this purpose the awesome tool that is the BFG repo cleaner, which I also used in the past to delete whole residual and sensitive files from my git history. This time following the instructions to replace text:

$ java -jar ~/bfg.jar --replace-text tokens.txt myRepo.git

But in the output, this appeared:

...
* commit 10134503 (protected by 'HEAD') - contains 1 dirty file :
- app.py (640 B)
...
If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

So I did exactly that, cloned the whole thing, made a commit replacing the two lines I wanted gone for 2 calls to os.environ[] on python and pushed it. Then I ran again the BFG, git reflog and everything seemed to worked like a charm.

I checked in gitlab's commit browser and the text was ***REMOVED*** everywhere but in the penultimate commit, where this happened:

img

I'm guessing it happened because the file is edited in next commit (now the one protected by 'HEAD') and GIT needs those tokens to recreate the changes I made to get rid of those 2 lines. But then, how do I achieve this?

Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
Alfageme
  • 2,075
  • 1
  • 23
  • 30
  • This is not a Git issue. It sounds like a bug in BFG. If you make a commit consisting *only* of replacing the unwanted tokens, and BFG replaces the unwanted tokens in the commit just prior to this new commit, you get two identical commits. Git handles this just fine (calling the second an "empty" commit, but it's not empty, it's just identical, source-tree-wise, to the prior commit). – torek Nov 07 '16 at 04:57

1 Answers1

2

Rewriting Git history is a tricky business, where it's very easy for users to make obscure mistakes that negate what they're trying to do. As the author of the BFG, I've spent a lot of time assessing user reports, trying to work out why strange things have happened. The BFG's just the middle step in the process of cleaning a repo, and there's plenty of room for user error in the steps before and after - it's fairly rare that there's actually a bug in the BFG itself *.

So, if I understand your description correctly, your commit history now looks like this (oldest first, newest last):

  • ...cleaned commits containing ***REMOVED*** where the credentials used to be
  • a penultimate commit where the unwanted credentials *are* present
  • the 'manual' cleaning commit where you instead read credentials from os.environ[]

Lets look at your actions:

So I did exactly that, cloned the whole thing, made a commit replacing the two lines I wanted gone for 2 calls to os.environ[] on python and pushed it. [X] Then I ran again the BFG, git reflog and everything seemed to worked like a charm. [Y]

The thing is, your commit history in GitLab at X would be exactly what you report your final commit history in GitLab is. So between X and Y nothing changed in GitLab.

So, two possible explanations:

  • There's a bug in the BFG (as @torek suggested) - I would love to see a simple test case demonstrating this
  • the final git push at Y just didn't happen or failed, eg because --force wasn't used.

If you could try running the BFG again on a fresh mirror clone of your repo, we could maybe eliminate one of those options.

Finally:

I'm guessing it happened because the file is edited in next commit (now the one protected by 'HEAD') and GIT needs those tokens to recreate the changes I made to get rid of those 2 lines.

Git stores commits as full snapshots of your file tree with each commit, it doesn't store them as the diff you might think it would use. So don't worry, Git doesn't need those old credentials to create your final hand-crafted commit (where credentials are read from os.environ[]).

Incidentally, the BFG was designed with the 'reformed-alcoholic' model of user behaviour - you're only supposed to run the BFG once you've realised you have a problem and cleaned yourself up - so make sure the latest commit in your repository is clean before you run the BFG.

* There are definitely conditions that make the BFG die with a fatal exception, but there are no bugs I'm aware of where the BFG's actually completed a run and behaved outside-of-spec.

Community
  • 1
  • 1
Roberto Tyley
  • 24,513
  • 11
  • 72
  • 101
  • 1
    Sorry to reply this late but I've been AFK for a while. First of all, thank you so much for your thorough answer, It's awesome that the author of the tool takes this time helping people use it. I've tried a couple of times to reproduce this from heart unsuccessfully. I'll try to do it again this week looking in my bash `history`, where the steps I followed in the first place will be. However, running again bfg showed that dirty penultimate commit in my history, and this time removed the content flawlessly. – Alfageme Nov 22 '16 at 00:53