2

I am cleaning up a gitlab repo which apparently had some very large files committed at one point, resulting in the .git folder being a whopping 7.5gb. I followed this guide on how to shrink the folder and rewrite the commit history, and successfully reduced the folder to 1.1gb on my local machine. I realize this is still very large, but at this point I would just like to update the remote repository before I continue trying to see if it can be further shrunk.

My problem is that, after pushing, the remote .git folder is unchanged. In fact, the overall size of the repository has grown by several hundred megabytes.

How do I push these changes properly?

void_panda
  • 53
  • 5
  • But the local `.git` folder has shrunk? How did you push the changes? With `git push -f`? If not, give it a try! (`-f` is for *forcing* the `push`-command) – SwissCodeMen Dec 09 '21 at 22:17
  • If there are any PRs that pointed to some of the branches you rewrote, you might find it quite difficult, maybe impossible, to reduce the size. I'm not sure exactly how GitLab handles closed PRs, but I know on GitHub a PR holds a pointer to a commit, and that commit cannot get garbage collected, even if you rewrite or delete the branch it's on. I would expect a similar behaviour on GitLab. – joanis Dec 09 '21 at 22:43
  • @SwissCodeMen Yes, the `.git` folder on my local clone of the repo has shrunk. I pushed with `git push origin --force --all`, but the `.git` folder on the remote repo did not shrink. – void_panda Dec 09 '21 at 23:37

1 Answers1

3

By default, when you clone a repository, you don't have all the remote refs locally. Even if you cleanup your local git repo (such that the local size is actually smaller) you may not see this reflected in GitLab. This is because (1) you don't have all the remote by default refs and (2) GitLab holds onto those references you have deleted locally in many circumstances. For example, if you have a pipeline which references the no-longer-existing-locally references that are taking up space or if a reference exists in a Merge Request, among other cases.

To deal with this, you'll need to additionally cleanup these references on the remote as well:

  • refs/merge-requests/* for merge requests.
  • refs/pipelines/* for pipelines.
  • refs/environments/* for environments.
  • refs/keep-around/* are created as hidden refs to prevent commits referenced in the database from being removed

If you add these refs to your local git repo and fetch them, you'll see a size that more closely reflects what is reported in GitLab.

For example, if you look at your git config, you will see something like this by default:

[remote "origin"]
  url = https://gitlab.com/gitlab-org/gitlab-foss.git
  fetch = +refs/heads/*:refs/remotes/origin/*

You want to edit your git config (using git config -e) and add the above references. For example, after adding the merge_requests references, your git config should look like this:

[remote "origin"]
  url = https://gitlab.com/gitlab-org/gitlab-foss.git
  fetch = +refs/heads/*:refs/remotes/origin/*
  fetch = +refs/merge-requests/*/head:refs/remotes/origin/merge-requests/*

Do that for each of the remotes that have not yet been cleaned up, fetch them (git fetch origin), clean them up locally, then force-push back to the remote.

However, some refs are not advertised and can only be retrieved by exporting the GitLab project and restoring the local repo from the export tarball (the project.bundle in the tarball)

git clone --bare --mirror ./project.bundle myrepo
cd myrepo
git filter-repo ... # modify this for your cleanup
git remote remove origin
git remote add origin <project clone URL>
git push origin --force 'refs/heads/*'
git push origin --force 'refs/tags/*'
# push hidden refs
git push origin --force 'refs/replace/*'

After doing this, git filter-repo creates a commit-map file at ./filter-repo/commit-map. Take this file and upload it to the repository cleanup under 'settings -> repository -> Cleanup'.

Keep in mind, removing these will also break features that rely on them (for example, you won't be able to review code/refs in previous MRs that have changes with removed references).

Also know that, after you push the cleaned up refs and initiate the repo cleanup, the size may take up to 30 minutes or more to update in GitLab depending on the repo size.

Additional reference: GitLab - Reduce repository size

Alternatively, you can create a new GitLab project and push your clean local state to the new GitLab project, then delete the old one. With this approach, you will, of course, lose much of the GitLab-stored history, like merge requests, settings, CI/CD pipelines, etc.
The new project could be moved in place of the old one to preserve correct clone URLs. This is the nuclear option.

sytech
  • 29,298
  • 3
  • 45
  • 86