5

I know there are thousands of threads for this question.

But I found out something really weird.

If you create a project on GitHub, do some commits.
Let's say commit 1, 2, 3, 4, 5.
Later, you realize you want to change something into commit 3.

As you were working in your own branch, no problem to rewrite history.

So let's do this: (based on this stackoverflow answer)

git rebase --interactive 'bbc643cd^'

// Modify 'pick' to 'edit' into interactive prompt and :
git commit --all --amend --no-edit
git rebase --continue
git push -f

Great! The mistake is corrected. The history has been rewritten, so the commit bbc643cd is now lkqjfhchc.
You can check the source on your GitHub and everything will have been updated.

But someone can still find it on GitHub!

Access the URL: https://github.com/your-nickname/your-project/commit/bbc643cd... (full commit hash) and you will find it!

How could we remove this commit for good?

Thanks for any help!

Community
  • 1
  • 1
maxime1992
  • 22,502
  • 10
  • 80
  • 121
  • Are you sure the original commit is not a part of any other branch as well? – CodeWizard Jan 03 '16 at 22:49
  • No. I just made one commit into master to start the repo. Then I created a "dev" branch. It's only in dev. – maxime1992 Jan 03 '16 at 22:51
  • If that can help, I noticed that not because I went to the url of the commit, but because I referenced an issue into the commit (with key word "closes" followed by the issue id). Now in the issue I have some text saying that a commit has a reference on it, and another similar message with the new commit hash. If I click on the old hash I can see the whole commit. – maxime1992 Jan 03 '16 at 22:55
  • It will probably disappear in a little while as long as it's not referenced by the history of any branches or tags, probably the next time github does a `git gc` on your repo and/or clears some cache of theirs. – hobbs Jan 03 '16 at 22:55
  • Oh, well, mentioning the SHA in an issue comment may very well count as a reference. – hobbs Jan 03 '16 at 22:57
  • @hobbs I read some posts about commits not in history. They should be cleaned 90 days later. As you said, I already did a git gc but nothing changed. – maxime1992 Jan 03 '16 at 22:58
  • Seriously why people put -1 on this post without explanation ? Anyway... – maxime1992 Jan 03 '16 at 22:59
  • @hobbs damn, is there anything I can do to remove this reference ? – maxime1992 Jan 03 '16 at 23:00

2 Answers2

5

I contacted Github staff from here : https://github.com/contact

Here's the answer (I couldn't do anything about it, no prune, no gc, etc)

Hey Maxime,

The commit was available because commits are not automatically deleted when they're removed from the history of a branch -- they're deleted when they're garbage collected. I just ran garbage collection for that repository manually and the commit should now return a 404.

How often does the garbage collector run on your end?

GitHub doesn't have a scheduled garbage collection process. We don't clear repository caches automatically (we're in the version control business, so we don't delete data unless we absolutely have to ) so usually, the only reason we would do that is if we had someone writing in to us asking for us to clear them as part of the sensitive data removal process.

It's also possible we might clear the cache for technical reasons if the content or structure of the repo was causing us difficulties to host it, but that would usually only happen if the repo was exceptionally large or had a wildly structured folder layout.

Please let me know if you have any questions about this or anything else.

Hope this helps.

Cheers, XXXXX

So you just have to wait or contact staff to force garbage collector in case you have the same problem !

alper
  • 2,919
  • 9
  • 53
  • 102
maxime1992
  • 22,502
  • 10
  • 80
  • 121
  • How often do contact staff ran the garbage collector? Can I manually force garbage collector to run in GitHub? – alper Sep 03 '21 at 13:04
  • You can't and as far as I remember they told me every 90 days but unsure. If you need to, contact github – maxime1992 Sep 03 '21 at 13:14
  • Its not helpful to get contact with them whenever we want to clean our previous commits. They should give as freedom to do it – alper Sep 03 '21 at 13:22
  • I don't make the rules and I'm not saying it's cool. Just letting you know how to achieve it as of today – maxime1992 Sep 03 '21 at 13:34
  • Should we get contact with GitHuh through https://support.github.com/contact?tags=rr-remove-data ? – alper Sep 03 '21 at 14:01
1

According to your additional comments :

You did everything as it should be.
The point is this: git never lose data unless you tell it to (whats known as gc - garbadge collector)

The files will remain there until they will gc will be called.

This is called dangling file

Dangling commit

A commit that isn't linked to any branch or tag either directly or by any of its ascendants.

You can see all the dangling references locally with this:

git fsck --full

enter image description here

The only way to get rid of it is to run gc

## !!!Caution:
## It will remove all your dangling files
git gc --aggressive --prune=now

Here you can read some more about it.

alper
  • 2,919
  • 9
  • 53
  • 102
CodeWizard
  • 128,036
  • 21
  • 144
  • 167