0

I'm currently working on a project that is mainly made of very small files, which means the current project's size is very small (< 1 MB) and the history should be pretty small as well.

But when checking the .git folder's size, I saw it was a whopping 17 MB!

So I investigated a bit with the following one-liner (taken from this answer):

git rev-list --objects --all |
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
  sed -n 's/^blob //p' |
  sort --numeric-sort --key=2 |
  cut -c 1-12,41- |
  $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest

Which shown that I had a set of large HTML, JS and JSON files taking up all the space. Which is weird, as I only have Markdown files here (it's a documentation repository).

So I took one of the reference's ID, run git show <ID> and it displayed a deploy entry from Github Pages!

The weird thing is that I don't have the gh-pages branch locally, only the main one which does not have any single HTML/JS/JSON file in it (I checked the whole history).

But I have some references of deployments with the changes on each HTML / JS / JSON file.

So my question is: why does that happen, and how can I get rid of all these useless informations that I don't want to have locally?

Thanks in advance for your help!

EDIT: To be clearer, I do not have the gh-pages locally. If I try to git checkout it I get an error telling me it doesn't exist. If I check out origin/gh-pages it works and creates a local branch but barely increases the .git size (less than 1 MB more) despite this branch containing a lot of data.

ClementNerma
  • 1,079
  • 1
  • 11
  • 16
  • 1
    When you clone or fetch from a repository, you get a copy of **all** of its history, not just selected branches. To fetch only selected branches, see existing questions such as https://stackoverflow.com/questions/1615488/clone-just-the-stable-and-one-other-branch-in-git and https://stackoverflow.com/questions/49039959/git-clone-specific-list-of-branches and https://stackoverflow.com/questions/58446026/git-clone-only-specific-branches-from-github and https://stackoverflow.com/questions/57674585/how-to-clone-only-selected-branches-from-git – IMSoP Oct 12 '22 at 14:00
  • Indeed, this solves my question! I used that and I my `.git` folder went down from 17 MB to 4.5 MB! Thanks a lot for your help :) – ClementNerma Oct 12 '22 at 14:18

2 Answers2

0

You could use sparse-checkout with the sparse-index option to shrink the your working folder and index locally.

E.g. in your repo do:

git sparse-checkout init --cone --sparse-index 
git sparse-checkout set <path1> <path2> <pathN>
git checkout main
jaspernygaard
  • 3,098
  • 5
  • 35
  • 52
  • I don't really get what you mean - what are those `` `` etc. supposed to be? – ClementNerma Oct 12 '22 at 14:13
  • Its normally used for speeding up mono-repo performance. E.g. path could be the backend top folder. In your case it's just your root folder: './'. Totally overkill solution, but then again so is trying to optimize a 17MB index ;) – jaspernygaard Oct 12 '22 at 14:28
0

Solved by @IMSoP's comment:

When you clone or fetch from a repository, you get a copy of all of its history, not just selected branches. To fetch only selected branches, see existing questions such as Clone just the stable and one other branch in git? and git clone specific list of branches and GIT: Clone only specific branches from GitHub and How to clone only selected branches from git?

ClementNerma
  • 1,079
  • 1
  • 11
  • 16