1

I am trying to clone a git repo which was more than 1.5 gb due to many branches. So the cloning was terminating repeatedly. The total objects are:

 remote: Enumerating objects: 75566, done.
 remote: Counting objects: 100% (75566/75566), done.
 remote: Compressing objects: 100% (42818/42818), done.
 Receiving objects:   3% (2267/75566), 1.91 MiB | 297.00 KiB/s 

So I have deleted most of the old branches of no uses. So it reduces the size:

remote: Enumerating objects: 3826, done.
remote: Counting objects: 100% (3826/3826), done.
remote: Compressing objects: 100% (392/392), done.
Receiving objects:   4% (3703/75566), 1.99 MiB | 280.00 KiB/s 

You can see the counting objects has been reduced but still it receiving objects is 75566. Why is that so. If it is due to cache then how can I clear that.

Also please note git rm -r --cached . is not working as I am clonning outside of any project and there is no .git folder

Thank you

FlyingFoX
  • 3,379
  • 3
  • 32
  • 49
Stack user
  • 519
  • 6
  • 19
  • Also please note git rm -r --cached . is not working as I am clonning outside of any project and there is no .git folder. – Stack user Sep 15 '20 at 12:20
  • 3
    You can use `--single-branch` and/or shallow clone to decrease the download size. – aragaer Sep 15 '20 at 12:35
  • 3
    The number of objects depends on the objects referenced to by the branches/tags. If most of the old branches share many common commits with the new branches, removing the old can hardly reduce the number of referenced objects. – ElpieKay Sep 15 '20 at 13:33
  • but please see the difference in no of counting objects values. They have a lot of difference(like from 75566 to 3826) – Stack user Sep 15 '20 at 13:36
  • 1
    Do you have large (binary) files in the repository? – dan1st Sep 15 '20 at 14:00
  • no I don't have – Stack user Sep 15 '20 at 14:02
  • 1
    How is the remote repository hosted ? `github.com` ? `gitlab.com` ? self hosted `gitlab` community edition ? ... – LeGEC Sep 15 '20 at 17:04
  • 1
    I have never delved into the mysteries of the internals of pack fetching and loading, but I've always suspected that Git takes a short-cut when doing an initial clone. A `git fetch` into a repository that has some existing Git objects will use the fancy code that figures out which objects you already have, to avoid sending them, and then builds what Git calls a *thin pack* to send to you. But on a fresh clone, the thin pack will, in effect, be thick, and there's an obvious shortcut: just send some existing thick pack instead. – torek Sep 15 '20 at 22:05
  • 1
    It seems like this is what is happening here. If so, the trick will be to convince the hosting site to repack the Git repository there, so that the thick pack that is sent is the new one containing the 3826 objects, instead of the old one containing the 75566 objects. (See @LeGEC's question about how the repository is hosted.) – torek Sep 15 '20 at 22:06
  • Thanks. it is on gitlab. Please let me know how can we ask repo to repack? – Stack user Sep 16 '20 at 06:24
  • "*more than 1.5 gb due to many branches*" The size is unlikely due to branches, branches in Git are basically free. It's more likely because there are many large files in the history. [Git Large File Storage](https://git-lfs.github.com/) can solve this problem. – Schwern Sep 18 '20 at 17:32

1 Answers1

0

You can limit the number of objects fetched with --depth (count of commits back into history fetched)

git clone --depth=1 ... 

And since Git 2.11 (Q4 2016) git clone allows you to limit based on the commit time

git clone --shallow-since=<date>:
Create a shallow clone with a history after the specified time.

The date format is one of the formats supported in git log

See https://stackoverflow.com/a/39994584/468252 for the specifics about when this was added.

qneill
  • 1,643
  • 14
  • 18