2

Background: I'm migrating a commercial piece of software to open source. This is currently a hypothetical case, but it might become real if I make a mistake.

Suppose I have inadvertently incorporated some commercial copyrighted material into my open source repository (e.g. some graphics). The copyright owner has issued me a cease-and-desist notice. Because this is a commercial matter I must remove the material completely; nothing less will do. I could of course remove the infringement in HEAD, but that leaves the infringing material in the Git history on GitHub.

Short of deleting the entire repository and creating a new one from scratch with the current HEAD, is there any way I can expunge the offending file from GitHub?

There is also the side effect that doing so will break historical builds that would otherwise have worked, but them's the breaks.

I've read "Handling copyright infringement in your own open source project" but the answers don't address the git history.

Update

I've now found git filter-repo, which is preferred to git filter-branch and is a large part of the answer. But how do I push the resulting change to my GitHub central repository? Or should I just delete the entire GitHub repository and create a new one from my munged local copy?

ecm
  • 2,583
  • 4
  • 21
  • 29
Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • Does this answer your question? [License violation within git history](https://opensource.stackexchange.com/questions/7837/license-violation-within-git-history) –  Mar 04 '22 at 09:23
  • 2
    It is possible to remove a file from the entire Git history, but this will change all affected commit IDs. Such removal can be automated with `git filter-branch`. – amon Mar 04 '22 at 11:12
  • I agree, so I'm migrating this to SO, where it can be handled, possibly by closing as a duplicate. – MadHatter Mar 04 '22 at 11:53
  • @Martin_in_AUT No, it doesn't. That only deals with an open source license violation where an update to the HEAD is sufficient. – Paul Johnson Mar 04 '22 at 17:02
  • 3
    Worst case you'll have to force push a cleaned up mirror of the repo. You may need to work with GitHub support to remove the files from caches and replicas after that. This will break all forks. And there is no way for you te enforce the removal of the copyrighted materials from forks and mirrors owned by other people. – jessehouwing Mar 04 '22 at 20:48
  • 1
    And `git filter-repo` has indeed superseded `filter-branch`. I've personally used BFG Repo Cleaner as well for this purpose. – jessehouwing Mar 04 '22 at 20:51
  • @jessehouwing you should write it as an answer – planetmaker Mar 07 '22 at 23:18
  • Does this answer your question? [Remove sensitive files and their commits from Git history](https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history) – matt Mar 09 '22 at 13:20

1 Answers1

0

Disclaimer: this is a summary of comments and later research. I haven't actually tried this, as repo-surgery of this kind is a last resort.

Removing files from your local git repository is best done with git-filter-repo. This has lots of options for doing this kind of thing. Create a fresh clone of your repo, study the git-filter-repo manual, and experiment.

However this only modifies your local copy. You can't push these changes upstream to GitHub (or wherever). You also have to notify any collaborators that your history rewriting has been necessary, and stop them from trying to merge their old repos back to your edited one. The best solution is simply to delete the old GitHub repo and create a new one with a different URL. See the manual for details.

ecm
  • 2,583
  • 4
  • 21
  • 29
Paul Johnson
  • 17,438
  • 3
  • 42
  • 59
  • What? Of course you can push the changes. If you weren't the owner of the repo, then you might be forbidden from non-FF pushes by a policy setting, but that doesn't seem to be the case here. – hobbs Mar 08 '22 at 17:31
  • My understanding is that the edited repo has an entirely new history, so if you push it to GitHub you get something with both the old and the new, which isn't what is needed. – Paul Johnson Mar 08 '22 at 17:33
  • Your understanding is wrong. You will have to use force to do the push, as you've already been told (days ago), and this will cleanly and completely replace the branch you've pushed with the replacement commits. And if multiple branches are affected, you will have for force a mirror push. But you can certainly push to the existing repo. – matt Mar 08 '22 at 17:49
  • @matt Can you write that up as an answer? Feel free to paste from the accurate bits of this one. – Paul Johnson Mar 09 '22 at 08:49
  • Feel free to delete the question. The question of how to remove a file from your entire Git history has been answered fully here already. This is, as you've already been told, a duplicate. – matt Mar 09 '22 at 13:18
  • For example https://stackoverflow.com/questions/872565/remove-sensitive-files-and-their-commits-from-git-history – matt Mar 09 '22 at 13:20
  • And https://stackoverflow.com/questions/2100907/how-to-remove-delete-a-large-file-from-commit-history-in-the-git-repository – matt Mar 09 '22 at 13:20
  • And many others with high vote counts and lots of answers. – matt Mar 09 '22 at 13:21