12

I have a relatively small git repository containing a standard Wordpress installation. However, I accidentally added a "concept" folder in the repository which contains many large psd files.

The problem is that now, after 50 commits, git has created a "pack" file, which is 1,3 GB.

To downsize the pack folder I tried to remove my "concept" folder via:

git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch DIRECTORY/' --prune-empty --tag-name-filter cat -- --all

and afterwards

git gc --aggressive --prune

After the execution of these commands, my "concept" folder was deleted from the file system and from all commits but the pack file is still exactly 1,3 GB

Is there anything else I can do to reduce the size of the pack folder?

pants
  • 192
  • 13
agrT
  • 225
  • 1
  • 2
  • 11
  • Is there any additional information that I could add to my answer (that you just "unselected")? – VonC Jul 15 '14 at 10:41
  • Unfortunately yes. I tried everything suggested and the following: - http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/ - http://rtyley.github.io/bfg-repo-cleaner/ - filter-branch with "rm" - git gc with prune=now and aggresive and the thing i thought which must really work: http://git-scm.com/book/en/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch Nothing will do the job. I can clear the commit history from all unwanted files and folders (e.g. with the bfg tool), but my pack file stays at 1.13 GB I contacted some git experts and mailing lists. No luck – agrT Jul 16 '14 at 11:29
  • And after all that, if you clone your "cleaned" (still big) repo into a new repo, would would still get 1.13 GB in the newly local cloned repo? – VonC Jul 16 '14 at 11:30
  • I tried a git clone on the "cleaned" repository. The pack file is present in the new clone and is.. 1.13GB – agrT Jul 16 '14 at 12:15
  • The problem seems to be: Even after cleaning up the repo, files from older commits are stored anywhere (but where? maybe that pack file?). When cloning, all files from any commit ever get cloned. Maybe there is a way to ".gitignore" before cloning? I tried to rewrite the .gitignore file before cloning, but that gets ignored too. All files are cloned. – agrT Jul 16 '14 at 12:53
  • That means the git format-branch has failed to actually remove the big elements. Did you try the BFG tool, with filter set to a smaller size, to see if more could be deleted? – VonC Jul 16 '14 at 13:06
  • We used the BFG Tool with a 500KB setting and it removed a lot of files. But after cloning everything stays the same – agrT Jul 16 '14 at 13:38
  • That is frustrating! What version of git are you using? – VonC Jul 16 '14 at 13:38
  • We tried another thing: http://aralbalkan.com/2389/ where we "git rm --cached" folders explicitly from the cache. Then we wrote a new .gitignore to exclude all folders in future. Then we cloned that repo. But everything and all files plus the pack gets cloned. Maybe we can do it like this and have an error in our process? 1. run the git rm --cached, 2. .gitignore unwanted stuff, 3. git gc, 4. git clone => receive a new repo with only the not ignored files or at least a small pack file? – agrT Jul 16 '14 at 13:41
  • Our version of git is 2.0.1 – agrT Jul 16 '14 at 13:41
  • By `git gc`, you do mean the `rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune=now`, as suggested in my answer, right? – VonC Jul 16 '14 at 13:42
  • Just to check, did you try to downgrade to git 1.9.x, just to see if the filter would work better? – VonC Jul 16 '14 at 13:52
  • using this command an error raises: fatal: pathspec .git/refs/original/ did not match any files – agrT Jul 16 '14 at 13:54
  • Sure, I meant `git reflog expire --all && git gc --aggressive --prune=now`: the first part is only when you just did a `filter-branch`. – VonC Jul 16 '14 at 13:58
  • Yes, we did only the git reflog expire --all && git gc --aggressive --prune=now but no effect. The repo size is exactly the same. After cloning, again, the same size of the repo and the pack file ;( – agrT Jul 16 '14 at 14:01
  • And did you tried with a less recent git 1.9.x (filter-branch + `rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune=now`) to check if this is a regression of some sort? – VonC Jul 16 '14 at 14:02
  • we have an 1.8.5 version here and tried it. same effect... – agrT Jul 16 '14 at 14:13
  • Finally i got something. I wrote my new .gitignore ignoring unwanted files and folders. then "git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached" to remove those "newly ignored" files. Then a commit. All newly ignored files got deleted from the repo (not physically). When i then clone the project, only the not ignored files and folders get cloned. Hoorayyy! But one problem still exists: The pack file in the clone again has 1.13GB Any chance i can shrink that pack file down? – agrT Jul 16 '14 at 14:58
  • Does the pack file remains huge even after a `git reflog expire --all && git gc --aggressive --prune=now`? If yes, that seems to make sense as your command was only for the latest commit, not for the full history. Any chance you could wrap your command in a `git filter-branch`? – VonC Jul 16 '14 at 15:03
  • You are right, the pack files stays the same size even after the reflog and gc command. I'd really like to wrap "my" command in a filter-branch command, but i don't even understand "my" command, as its only copied from here: http://stackoverflow.com/questions/7527982/applying-gitignore-to-committed-files (the 85 upvoted answer). DO you have any clou how to wrap it? – agrT Jul 16 '14 at 15:20
  • Maybe like you did in your question `git filter-branch --index-filter 'your command'`, also an `--index-filter` might not be the right option as you need to edit a file, a `--tree-filter` might be more appropriate. – VonC Jul 16 '14 at 15:22
  • I wrote a new .gitignore `/* !/wp-content /wp-content/* !/wp-content/themes` and then did a `git filter-branch --tree-filter 'git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached' --prune-empty --tag-name-filter cat -- --all` and a `git reflog expire --all &&  git gc --aggressive --prune=now` then i cloned the repo. What i get is good. It's only the "wp-content/themes" folder. But i also get the pack file. Which is again 1.13GB and should be much smaller. Any other tip? I feel it's getting somewhere. Hopefully :) – agrT Jul 17 '14 at 07:31
  • Just for testing, could you do a `filter-branch` which would `git rm` the whole `wp_content` (and leave the rest), just to check where that large element is? I think that, if the pack-file remains that large, you are not '`git rm`' the right elements. – VonC Jul 17 '14 at 07:42
  • the large elements are other folders. one with PSD data in a concept folder, and another is 700MB of gallery JPG data in a gallery folder. both folders are not in the wp-content/themes folder. the wp-content/themes folder is only 8MB. i hope i got your question right. – agrT Jul 17 '14 at 07:48
  • Yes, but my point is: start trying to `git rm` *more* in your `filter-branch` command, just to test if that has any effect on the resulting pack-file. For testing. – VonC Jul 17 '14 at 07:49
  • do you mean in the cloned or in the original repo before cloning? – agrT Jul 17 '14 at 07:54
  • "has any effect on the resulting pack-file": on the cloned one, after a `filter-branch` on the original one, followed by the usual `git reflog expire --all && git gc --aggressive --prune=now` – VonC Jul 17 '14 at 07:55
  • I did `git filter-branch --tree-filter 'git ls-files -ci --exclude-standard -z | xargs -0 git rm --cached' --prune-empty --tag-name-filter cat -- --all` on the original with a .gitignore like above (ignoring everything but the wp-content/themes folder) than the reflog and commit then i cloned. in the clone i did e.g. `git filter-branch --tree-filter 'git rm -rf --cached --ignore-unmatch wp-content/gallery/' -f --prune-empty --tag-name-filter cat -- --all` with 2 big folders and at last again a reflog. the pack file is 1.13 GB. when doing the "rm" on the clone its right. many files get deleted – agrT Jul 17 '14 at 09:12
  • also odd in my clone: when i do `git rm -rf .git/refs/original/` i get: `fatal: pathspec '.git/refs/original/' did not match any files` even if i can `cd` inte the .git/refs/original folder. it exists! – agrT Jul 17 '14 at 09:24
  • would it be possible to clearly list the steps how to shrink the pack file normally? i tried everything i've read now but nothing works. maybe i did wrong steps or steps in the wrong order... something like: 1. edit .gitignore to ignore all files and folders to be deleted in the past commits 2. delete files and folders from old commits via the .gitignore list 3. commit 4. filter-branch 5. clone .... or something like that? – agrT Jul 18 '14 at 10:05
  • I don't know of any more step. I would be enclined to test out a script which would, for each commit, add the content of the current commit to a *new* repo, and monitor the size of said new repo, in order to see at which commit the pack files grow abnormally fast. A script a bit similar to http://blog.ploeh.dk/2013/10/07/verifying-every-single-commit-in-a-git-branch/. – VonC Jul 18 '14 at 11:09
  • Ok. One last question: The order i described above (1,2,3,4,5) is that the right order? I thank you very much for all your information and effort. I'll give the blog post a try and if i don't get what i need, i'll backup all my projects, delete all .git folders and start all over again. In the future i'll be very careful with my initial git and with all files and folders that get added afterwards and are not ignored by the .gitignore. Should i now flag this question as answered? – agrT Jul 18 '14 at 11:58
  • The order seems sensible. You can close that question for now. – VonC Jul 18 '14 at 12:04
  • Did you find any solution for this,i am facing similar issue,but unable to solve this so far. – dReAmEr Nov 08 '16 at 13:46
  • Sorry, no. A friend of mine sent me this link but this didn't work for my case. Maybe this helps? [Finding and Purging Big Files From Git History](http://naleid.com/blog/2012/01/17/finding-and-purging-big-files-from-git-history) – agrT Apr 19 '17 at 12:52

1 Answers1

9

I mentioned before that a git gc alone can actually increase the size of the repo).

The commands I was listing were:

# remove the temporary history git-filter-branch otherwise leaves behind for a long time
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune=now

If that doesn't work, check out the BFG tool.

You will find an even more complete gc in "How to remove unreferenced blobs from my git repo"

This article suggests a similar approach:

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git fsck --full --unreachable
git repack -A -d
git gc --aggressive --prune=now
Community
  • 1
  • 1
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • I used your command here. The output is: Counting objects: 10666, done. Delta compression using up to 8 threads. Compressing objects: 100% (10425/10425), done. Writing objects: 100% (10666/10666), done. Total 10666 (delta 3292), reused 7374 (delta 0) but nothing changed. The pack file is again 1.3 GB My problem is not that i have really big files in my repo. I just want to remove a larger folder from it and the pack file. – agrT Jul 11 '14 at 08:43
  • @AndreasGeibert did you use the last edit (the one with `git repack`)? – VonC Jul 11 '14 at 08:44
  • i tried it now with both commands. no effect. the pack file is still 1,3 GB ;( – agrT Jul 11 '14 at 08:50
  • @AndreasGeibert can you check if you still have the large file? http://stevelorek.com/how-to-shrink-a-git-repository.html – VonC Jul 11 '14 at 08:51
  • I checked for large files. The largest file ist "192283". If i understand the script, that's in kB's. On disk, this file is 3,6 MB – agrT Jul 11 '14 at 09:16
  • I also see files in the script's output, which are not present in the file system anymore. Some of them are from the remove "concept" folder. – agrT Jul 11 '14 at 09:18
  • I'm sorry " On disk, this file is 3,6 MB" is not right. The largest file(s) are all not present in the filesystem. Only, i think, in the pack file. – agrT Jul 11 '14 at 09:24
  • @AndreasGeibert ok, and if you clone that local repo (in which you tried to remove the large file), would that new clone have the same large pack files? – VonC Jul 11 '14 at 09:40
  • yes, after cloning the same 1,3 GB pack file is in the new cloned repo – agrT Jul 11 '14 at 10:00
  • Isn't there an easy way to delete all my files and folders and don't need anymore, add them to the .gitignore, delete the pack file(s) and rewrite the pack files? – agrT Jul 11 '14 at 10:05
  • 1
    @AndreasGeibert that is what http://rtyley.github.io/bfg-repo-cleaner/ is supposed to do. – VonC Jul 11 '14 at 10:41