Git - checking out more files than in repository

Question

We're migrating our repos from TFS to Git and for one of our repos, the clone process and checking out on the build machine says it's receiving objects and shows a count of 30,000. Our repo only has around 2000 files in it. How can we find out what it's actually checking out? More importantly, how do we fix it?

TIA

"objects" includes file revisions, directories and plenty of other stuff, it's quite easily possible to have an order of magnitude more objects than files (especially if you also imported history). — Joachim Sauer, Apr 11 '22 at 16:52

score 4 · Accepted Answer · answered Apr 11 '22 at 16:54

4

In git, every "working copy" is actually a full "clone" of the repository. That includes the full history of every file, and the metadata of every commit, branch, tag, etc.

This allows you to work with the history without an active connection to the central server - indeed, git was originally designed not to have a central server at all, although in practice a service such as Github or GitLab is frequently used as a central "source of truth" for collaboration.

The objects being "received" are not the files being checked out to work on right now, they are the constituents of that history database. This is perfectly normal, and not something you need to fix.

Once the clone has completed, you will see the working copy contains the ~2000 files you expect, plus a directory called ".git", which is where everything else is stored. There won't be 30000 files in there either, because git packs multiple "objects" in the database into optimised and compressed files.

answered Apr 11 '22 at 16:54

IMSoP

89,526
13
117
169

Since our build machines are wiped between builds, is there a way to just get the files we need to build (the files we actually used to build the project)? – Dan Apr 11 '22 at 17:34
@Dan For that kind of scenario, it may be appropriate to use a "shallow clone", using [the `--depth` parameter to `git clone`](https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt). That way, most of the history won't be downloaded, and the clone will happen a lot faster. – IMSoP Apr 11 '22 at 19:12
It's worth mentioning that [you can't push a shallow clone to a new remote](https://stackoverflow.com/a/50993902/584676). This really is only an issue though when migrating to a new remote or if you consistently work from multiple remotes. But most of the time a large git history isn't bad and save for the egregiously large projects, shallow clones shouldn't generally be required today. OP's use case may benefit from a shallow clone, however, due to the ephemeral state of the working copy. – codewario Apr 11 '22 at 19:34
1

@BendertheGreatest Yes, I emphatically wouldn't recommend a shallow clone for *working with* the repository (use a full clone), but for a transient build where the history isn't important, only a particular state, it can be useful to save build time on a large repository. – IMSoP Apr 11 '22 at 20:16

Git - checking out more files than in repository

1 Answers1

Linked