0

I accidentally published sensitive information instead of putting a sample image for two files in two commits a few months ago on our company's main website repo. I shared two pics at images/Example.jpg and examples/Example.jpg which ended up containing sensitive info instead of the actual example I wanted to push.

Now I'm wondering how can I substitute these with the correct Example.jpg and replace all the previous commits such that you can't ever "Browse the repository at this point in time" and be able to see that picture?

The information is in two files in two separate commits. If they weren't so far up the commit history, I'd probably just reset HEAD and repush, but that seems impossible now. Visually you can think of it like:

[AuthorX] commitLatest       Latest commit
[AuthorY] commitTwoDaysAgo   updated something
.....
.....
.....
[  ME   ] commit2MonthsAgo sensitiveInfo1
.....
[  ME   ] commit2MonthsAgo sensitiveInfo0
[AuthorZ] olderCommits       --FINE FROM HERE--
.....

I want to keep the commit history exactly the same except that the two images in commits sensitiveInfo1 and sensitiveInfo0 have a different images/Example.jpg. Is that possible?


I tried filter-branch from here but it gave this warning:

WARNING: git-filter-branch has a glut of gotchas generating mangled history
 rewrites.  Hit Ctrl-C before proceeding to abort, then use an
 alternative filtering tool such as 'git filter-repo'
 (https://github.com/newren/git-filter-repo/) instead.  See the
 filter-branch manual page for more details; to squelch this warning,
 set FILTER_BRANCH_SQUELCH_WARNING=1.

I'm worried I'll mess up the company's main repo if I use that. The documentation for git-filter-repo is so confusing and convoluted. Like I'm worried it'll completely rewrite the authors or data or something. Or for example, we have lots of open branches, and I'm worried that it will mess up something there? Any suggestions on how to do this safely? I really appreciate this as I'm kinda panicking.

Thanks so much.

JoeVictor
  • 1,806
  • 1
  • 17
  • 38
  • 2
    Does this answer your question? [How to remove/delete a large file from commit history in Git repository?](https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-git-repository) – ErikMD Jul 31 '20 at 19:17
  • 2
    I was about to give [git-filter-repo](https://github.com/newren/git-filter-repo#why-filter-repo-instead-of-other-alternatives) as a pointer, but I've just realized that you already mentioned it in your question. – ErikMD Jul 31 '20 at 19:19
  • I'm mostly concerned with the repercussions to this move. I'm worried this may make some people not be able to push their branch or erase someone else's work – JoeVictor Jul 31 '20 at 19:24
  • I'd use `git-filter-repo`, but I'm not sure what that'll do – JoeVictor Jul 31 '20 at 19:24
  • 2
    So yes, using `git-filter-branch` or `BFG` or `git-filter-repo` (the recommended one) is a destructive operation, just like e.g. `git rebase -i` + `git push --force`. Which means that each collaborator that had an old version of `master` or so in their local repo will need to do operations such as `git checkout master && git reset --hard origin/master` after the first `git fetch origin` after the operation (at least, they will get a notification by Git CLI that the `git fetch` itself was not fast-forward but "force-pushed"). – ErikMD Jul 31 '20 at 19:27
  • so there's no way to merge existing open branches without the hard reset, huh? – JoeVictor Jul 31 '20 at 19:30
  • 1
    I guess the force-push + `git gc` with the proper options is needed (notably as the architecture of Git ensures that once a given blob (e.g. containing the file you want to remove) is stored in a local repo, it will never be replaced/overwritten by another version of this blob, say, with the same SHA1 and some different contents). This contributes to the security/safety of the history in particular. – ErikMD Jul 31 '20 at 19:34
  • 2
    BTW note that `git-filter-repo` comes with some official contribs, including [bfg-ish](https://github.com/newren/git-filter-repo/blob/main/contrib/filter-repo-demos/bfg-ish) (inspired by BFG): maybe this could be a simpler front-end to use (I did not test it though) – ErikMD Jul 31 '20 at 19:35
  • 2
    To be a bit more precise for your question "so there's no way to merge existing open branches without the hard reset?" → people should (1) **reset** their local `master` branch w.r.t. the the remote (e.g. `git fetch origin && git checkout master && git reset --hard origin/master`), then (2) **rebase** interactively their open feature branches onto the new master branch, paying attention not to re-add along the way, the old-unwanted commit that was part of the old master. – ErikMD Jul 31 '20 at 19:42
  • This seems complicated to handle on their side. The people contributing to this repo are mostly nondevs who barely know what Git is. If I do this, they'll probably suffer a lot, right? – JoeVictor Jul 31 '20 at 19:44
  • 1
    Or maybe you could try to devise some interactive shell script they could use to ease the migration? (e.g. with some command-line questions such as, "Which of the following local branches would you like to keep?" "Press any key to continue", etc.) – ErikMD Jul 31 '20 at 19:48
  • I was hoping to do this without raising any eyebrows. I was trying to avoid anyone in the company noticing. I guess I'm somewhat screwed, huh? – JoeVictor Jul 31 '20 at 19:50
  • 2
    I'm afraid there's no good solution… once a file has been published in a commit, if rewriting the branch history is not a solution, the only way to remove the file is to push yet-another-commit that removes it (a bit like `git revert`) – ErikMD Jul 31 '20 at 20:00
  • Ok, sounds good. Thanks for your help anyway – JoeVictor Jul 31 '20 at 20:13

0 Answers0