0

I'm trying to clean up a git repository of latex code that contains the generated pdf files, because these files have caused the repo to balloon up to a size of 300mb.

Adapting a bit from the answer here How to remove file from Git history?. I tried the following command:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch *.pdf' HEAD

This reduced the size a little, but not as much as i'd hoped. When I then try the script found in the answer to this question: How to find/identify large commits in git history?, to find which files contribute to the size, it still shows several pdf files. However, if i try the script found in this question: Which commit has this blob?, it cannot find any commit that contains the file.

I have removed all branches except the local branch. I have not pushed the changes to the remote.

Is there any reason these files would still persist in the history somewhere? What other things can I try?

Thijs Steel
  • 1,190
  • 7
  • 16

1 Answers1

1

You may have blobs still present just because the garbage collector didn't collect them.

Try cloning your local repo, and check the size of the .git/ directory in that new clone :

git clone myrepodir myclone
cd myclone
du -sh .git

# you can then remove that clone :
cd ..
rm -rf myclone

This will be a more acurate view of how much data would be pushed or cloned.


If you are 100% positive the content after your filter-branch action is the content you want to keep, and if you don't mind loosing your reflog (no more undos, drops all your stashes) : you can run

git gc --aggressive --prune=now

See also git help gc for more details on what could be retained on your disk.

LeGEC
  • 46,477
  • 5
  • 57
  • 104
  • 1
    Shouldn't `file://./myrepodir` be used so that a real transfer of the reachable objects takes place? Otherwise, packfiles could be copied or hard-linked and unwanted objects could end up in the clone. – j6t Dec 18 '20 at 13:36
  • the clone ended up pretty much the same size as the original repo, but the garbage collection really helped. – Thijs Steel Dec 18 '20 at 13:58
  • @j6t : you're right, I had forgotten about that optimization – LeGEC Dec 18 '20 at 14:25