6

Someone accidentally committed some large (multi-GB) binaries to my self-hosted gitlab repository, and now every time someone tries to pull from the repository the server gets hit really hard.

I tried removing any reference to the files via force push, but it still seems to impact the server. Is there a way to force the gitlab server to get rid of it?

I read up on some stuff like filter-branch but I'm not sure what that would do to a bare repo or how I'd even use it on a commit I no longer have a reference to.

Update: For reference, these types of messages are appearing on the gitlab VM's console:

[ 5099.922896] Out of memory: kill process 6200 (git-upload-pack) score 1053982 or a child
[ 5099.922908] Killed process 6202 (git)
[ 5099.930796] Out of memory: kill process 6200 (git-upload-pack) score 360394 or a child
[ 5099.930807] Killed process 6203 (git)
[ 5099.938875] Out of memory: kill process 6200 (git-upload-pack) score 360394 or a child
[ 5099.938886] Killed process 6203 (git)
[ 5099.951163] Out of memory: kill process 6139 (git-upload-pack) score 324327 or a child
[ 5099.951174] Killed process 6151 (git)
Karl
  • 6,035
  • 5
  • 30
  • 39
  • 5
    How did you attempt to remove the files? – Tim Biegeleisen Aug 11 '15 at 03:11
  • 1
    Interesting read: http://stackoverflow.com/a/31933020/6309 and http://stackoverflow.com/a/28720432/6309 – VonC Aug 11 '15 at 11:52
  • @Tim - I created a commit that reverts the unwanted files then squashed it onto the original commit, so as far as the branch history is concerned it no longer exists, but it's still floating around in Git's internals somewhere. – Karl Aug 11 '15 at 18:43
  • @VonC - That looks like it might have potential. If I run the BFG, gc and stuff then push to remote, will it cause the same change in the remote? Or should I be running these tools directly on the server? – Karl Aug 11 '15 at 18:45
  • @Karl I would run those commands on the server as well. – VonC Aug 11 '15 at 19:01
  • @VonC - Seems like running the BFG tool on the server did the trick. I was unable to do it locally because trying to pull a bare repository was causing it to run out of memory... if you submit an answer I'll accept it. Thanks for the help! – Karl Aug 13 '15 at 00:25
  • @Karl Great! Answer added. – VonC Aug 13 '15 at 05:37

3 Answers3

10

As the OP Karl confirms in the comments, running BFG repo cleaner on the server side (directly in the bare repo) is enough to remove the large binaries.

If you follow that with (as mentioned in "Git - Delete a Blob"):

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

But also ("git gc --aggressive vs git repack"):

git gc
git repack -Ad      # kills in-pack garbage
git prune           # kills loose garbage

You should end up with a slimmer and smaller bare repo.

Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
2

To do this, you will break the history of the repositories of any one that had pushed from this commit. You will have to tell them.

What you need is to rebase your remote repository and remove this commit.

First, rebase in your repository.

git rebase -i problematicCommit~1

This will open your default editor. Remove the line of the commit problematicCommit. Save the file and close it.

Remove the branch in your remote repository.

git push origin :nameOfTheBranch

Look the dots before the name of the branch.

Finally, create again the branch in the remote.

git push origin nameOfTheBranch

This regenerate the branch in the remote without the conflictive commit and the new clones will be fast again.

Now, If you still notice that your repository is going slow. You can erase the untracked objects (e.g. the ones with this big file) that it has.

First, remove all tags, branches that could be pointing to the old commits. This is important because to be able to erase old commits, they must be untracked.

Then, following the VonC comment stackoverflow.com/a/28720432/6309 - Do in your repository and in the remote:

git gc
git repack -Ad
git prune
blashser
  • 921
  • 6
  • 13
  • This is approximately what I already tried, the only difference being I used `git push -f` instead of deleting and recreating the branch. I don't think there would be a difference since the history is gone either way. The server is still having erratic memory usage, which leads me to believe that the objects in question are still being bounced around. – Karl Aug 11 '15 at 18:53
  • This is different, because after the rebasing, in the branch the problematic objects are gone for good. However you can remove these untracked objects following the link of VonC. I put how in my answer. – blashser Aug 11 '15 at 20:29
1

Had the same problem and the process to get it resolved was quite involved.

We run the community-maintained sameersbn/gitlab 11.4.5 in a Docker container. I didn't want to install bfg there, but opted to perform the changes locally.

# Install the bfg tool, ex. on MacOS via homebrew
brew install bfg

# Clone repo locally
cd ~/Development
git clone --mirror ssh://git@server.com:22/some/dir/myrepo.git

# Clean the repo
bfg --delete-files \*.pdf myrepo.git
cd myrepo.git
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

# Upload to container-host, e.g. via FileZilla

# Connect to the container-host via ssh

# Rename the original directory in the container, to have a backup
docker exec -it gitlab /bin/bash
mv /home/git/data/repositories/some/dir/myrepo.git /home/git/data/repositories/some/dir/myrepo.git.mybackup
exit

# Copy from container-host into container
docker cp /root/Documents/myrepo.git gitlab:/home/git/data/repositories/some/dir/myrepo.git

# Fix permissions in container
docker exec -it gitlab /bin/bash
cd /home/git/data/repositories/some/dir/myrepo.git
find . -type f -print0 | xargs -0 chown git:git
chown -R git:git /home/git/data/repositories/some/dir/myrepo.git
chmod 770 /home/git/data/repositories/some/dir/myrepo.git

# Re-create the "hooks" subdir with some symlinks in the repo
cd /home/git/gitlab/bin
./rake gitlab:shell:create_hooks

# Clear Redis cache (unclear if needed)
./rake cache:clear
exit

# Clone the changed repo locally again, also tell everyone who got a copy to clone again (history is broken now)

# Then do a commit to the repo, to hit the hook and trigger a size recheck
Günther Eberl
  • 674
  • 1
  • 10
  • 19