2

I have a bare git repository on the production server used only to deploy from the local repository with a simple git push prod.

Everything is working fine and the git repository is few hundred MB, until now, but I don't feel comfortable with the idea of the repository growing without limit on the server.

Is there a way of removing old files on the bare repo, or should I change completely my deployment configuration?

Glasnhost
  • 1,023
  • 14
  • 34
  • 1
    Git commit history is incredibly small. Unless you store big binary files (images) in you repo that change frequently. Are you doing that? – Chronial Apr 28 '13 at 05:59
  • no no, just being paranoid about the future... – Glasnhost Apr 28 '13 at 07:13
  • well maybe I did it in the past of the project... – Glasnhost Apr 28 '13 at 07:19
  • Look at `--depth` option of `git clone`. It should do what you want. Note that the documentation says you cannot push to/pull from an incomplete repository, but it's apparently wrong (outdated?) and a bug report has been filled on the man page. I can't seem to find the question/answer that raised this discussion. – Shahbaz Apr 28 '13 at 10:55

5 Answers5

1

Perhaps you could trim away some of the unneeded tags and branches from the server, then running git gc --aggressive.

Note that there is no useful way to remove "old" commits from a repo that's intended to be cloned from, e.g. to be shared with others. Cutting history like that (called a shallow clone) invalidates the repo for many important operations—like cloning, pushing and fetching—which pretty much contradicts the raison de etre of a bare repo.

jpaugh
  • 6,634
  • 4
  • 38
  • 90
1
#!/bin/bash
set -o errexit

# Author: David Underhill
# Script to permanently delete files/folders from your git repository.  To use 
# it, cd to your repository's root and then run the script with a list of paths
# you want to delete, e.g., git-delete-history path1 path2

if [ $# -eq 0 ]; then
    exit 0
fi

# make sure we're at the root of git repo
if [ ! -d .git ]; then
    echo "Error: must run this script from the root of a git repository"
    exit 1
fi

# remove all paths passed as arguments from the history of the repo
files=$@
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch $files" HEAD

# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

From: http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/

I guess this page contains instructions to do exactly what you need. But you'll need to copy and run the script shown. Be aware! Look at the comments too!

felipe.zkn
  • 2,012
  • 7
  • 31
  • 63
  • interesting thanks! maybe I do need to remove some old things from the repo after all – Glasnhost Apr 28 '13 at 07:20
  • If the website you linked to becomes unavailable, your answer becomes useless. Please quote at least the most important.relevant part of the article. If you can't do that then this might be suitable as answer. – Felix Kling Apr 30 '13 at 14:34
1

If the repo is a few hundred MB and you don’t have any big files in there, something is wrong. Maybe someone comited a big file in the past that you might want to remove from the history. Check here on how to find such files: Find files in git repo over x megabytes, that don't exist in HEAD

Your repo size should not be an issue – neither now nor for the future. For comparison: the git source repo contains 34251 commits and is 57MB in size. The repository of the linux kernel is 700MB in size (a working copy of the kernel is 500MB).

Apart from rewriting your history to remove big files there is no way to shrink a git repository. Because that should not be necessary.

Community
  • 1
  • 1
Chronial
  • 66,706
  • 14
  • 93
  • 99
1

Heavily used repositories get too big eventually.

However 'too big' doesn't really mean filesize so much these days though.

It's more about the fact that (in this case) too big means consequences such as:

  • More to load into gui git tools

  • More to search through with tools such as git grep (a little known super fast mega-cool tool)

  • More older results when using search/find in git log

To be honest I would consider just starting completely afresh. Yup, bare repository. So I would do this:

cp your project project_4_28_2913 # So this history is still kept
cd your_project
rm -r .git
git init

This probably wouldn't work for some projects where the requirement is for all history to be kept online in the same repo. However, in practical real world employment I have found that the need to go back more than a day or two to look at older git commits is actually pretty rare and looking at commits that are more than a 1 week or two old only happens a few times a year (say less than 6 for a team of 4).
It's a balance of course and at the end of the day I have found that a clean new repo at some suitable point, with the old repo saved, worked best for me.

This approach is also quick and time is money, so while the other approaches are more fine-grained and precise you always need to factor in how much time you want to spend on these processes as opposed to working on deliverable features that will make your organization money.

Michael Durrant
  • 93,410
  • 97
  • 333
  • 497
0

As part of "PuppetGit" I have implemented a script that looks at old commits and prunes them by (ab)using the grafts facility.

Have a look at ppg's code, inside ppg-push-reports. Git repo at http://repo.or.cz/w/puppet-git.git/

I was part of the early git hackers group (I wrote or maintained several importers to ease migrations from other SCMs), so I know a couple of things about tooling git. I am not infallible though, so use with care.

hth! ~ martin