6

I've been trying to decrease the size of my Git repository by moving some files to git LFS. A few 100 commands further, here we are with files in git LFS and no more history of these files in my git commits.

However, whenever I clone the repository I am still downloading about 3gb of objects. I worked around this issue by creating a new repository in Visual Studio Team Services and after pruning and garbage collecting my repository locally, then doing a push to there, it was reduced to 300mb. (Command to do this locally found in this post: Git: what is a dangling commit/blob and where do they come from?)

However I can't imagine that you always have to delete / recreate a complete repository to remove dangling commits.

What I also tried was doing a git init, and then pushing that over the existing repository, but it only increased the object count further.

For anyone also running into similar issues, these were the command I executed to create a new repository without dangling commits, I would however like to find out how to do this in the existing repository without having to delete it:

git clone https://avavedse.visualstudio.com/Test/_git/TestRepository
cd blahblah
git reflog expire --expire=now --all
git gc --prune=now
git remote add newrepo https://avavedse.visualstudio.com/Test/_git/TestRepositoryNewEdition
git push newrepo
Paulo Boaventura
  • 1,365
  • 1
  • 9
  • 29
Devedse
  • 1,801
  • 1
  • 19
  • 33
  • An LFS rewrite is a very significant modification to a repo; why can you "not imagine" that it requires completely replacing the remote? For the record, if the remote's hosting service provides control over `git gc` then you *might* be able to clean it up instead of replacing it, but I generally don't count on that. – Mark Adelsberger May 24 '17 at 14:40
  • Because it implies having to actually do functional changes to a repository to work around a technical limitation. – Devedse May 24 '17 at 14:43
  • How so? You have a remote with a set of refs; you delete it and replace it with a new repo that has the same refs (pointing at the same commits, even) but with less bloat. What functional change is that? By contrast, when you ran the LFS migration, which changed all of your refs to point at new commits, that *already was a functional change* that will require any user to perform a recovery (most easily handled by discarding and replacing all clones). Since the LFS migration is as drastic a functional change as can occur, I'm not understanding the concern – Mark Adelsberger May 24 '17 at 14:49
  • You seem to be defending this product limitation and I honestly have no clue why? Why should people remove and completely recreate a repository just to reach their goal of decreasing the size of that repository. Their requirements do not include a new repository, it is just a workaround/hack for functionality that, reading from your story, seems to not be implemented in the product. Furthermore I'm not sure what other impact removing / recreating a repository will have, will pull requests be saved?, issues?, etc – Devedse May 24 '17 at 14:55
  • The fact you think everything has to be a value judgement is your problem, not mine. I'm telling you how it works, and asking you to explain why this creates practical problems for you. If the problem is "I think it should be different", that's not a PRACTICAL problem; but hey, feel free to take it up with the service provider or change service providers. If your chosen git hosting service provider doesn't expose a `gc` interface - and I believe VSTS doesn't - then you have to replace the repo. Period. – Mark Adelsberger May 24 '17 at 15:30

2 Answers2

5

This may be a duplicate of How to remove a dangling commit from GitHub?

GitHub will periodically garbage collect objects that cannot be reached from a top-level reference. So over time, they'll disappear. But this is not guaranteed. That's the best info I've found on this.

You can manually correct reflogs expiry date to now and run garbage collector:

git reflog expire --expire=now --all
git gc --prune=now

But this will only affect the local repo.

Apparently, the garbage collector is far from being ideal, insomuch that unless you don't mind deleting and creating a new repo and losing all issues, pull requests, etc., you have to contact Github Support :

you can permanently remove cached views and references to the sensitive data in pull requests on GitHub by contacting GitHub Support or GitHub Premium Support.[docs.github.com]

Kukuster
  • 135
  • 4
  • 10
1

Actually, when you execute git reflog expire --expire=now --all and git gc --prune=now, the danglind commits were removed. You can double check by git fsck --full. If the output doesn’t show commits, that means there has no dangling commits.

Another reason that the repo size was not decrease apparently is you were not delete the LFS files in git history. You can rewrite the history by:

git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch filename --prune-empty --tag-name-filter cat -- --all
git push -f

More detail about move files from git to git-lfs, you can refer moving a file in your repository to git-lfs.

Marina Liu
  • 36,876
  • 5
  • 61
  • 74
  • 3
    But this command seems to only prune the repository locally, not in the remote repo. – Devedse Jun 14 '17 at 11:13
  • FYI, neither "git fsck --full" nor "git fsck --dangling" shows dangling commits in .git/lost-found, you have to run "git fsck --lost-found" to ensure that there is no dangling commits anywhere. – xuancong84 May 12 '20 at 03:09