How to show the specific files being downloaded when using git clone?

Question

When using 'git clone' command to download repository I want to show which file is being downloaded currently, and this helps to identify which files are big files and may take more time. But currently it only shows the progress:

remote: Counting objects: 4783, done.
remote: Compressing objects: 100% (515/515), done.
Receiving objects:   4% (226/4783), 15.91 MiB | 273.00 KiB/s

Is there a parameter that show this? In Perforce, it does exactly this thing for me.

I don't think Git stores files per se - it converts them to objects and stores those, so that's what's being transferred, not the files themselves. You could think of it like compression. But I might be wrong. — wjandrea, Mar 31 '20 at 17:24
Is there a way to find which objects are big? My repository doesn't have big files but when I clone it is still very big, probably that's because I used to have big files in repository then I deleted, but somehow it's there. Can I delete the .git/.object? — marlon, Mar 31 '20 at 17:30
Yeah, Git syncs the entire history. Check out [How to remove/delete a large file from commit history in Git repository?](https://stackoverflow.com/q/2100907/4518341) — wjandrea, Mar 31 '20 at 17:33

score 1 · Accepted Answer · answered Mar 31 '20 at 19:24

For now, consider

using --depth or --shallow-since when cloning, and/or
making one clone and keeping it, and just using git fetch to update it.

The former produces a deliberately truncated clone, which has some limitations (but in modern Git, not that many and hence can still be useful). The latter is usually the way to go. Clone once, then update: updates are fast!

Details

You can't quite get what you want here. As wjandrea said in a comment, Git doesn't exactly store files. At this point in a clone, your Git is copying objects that have been compressed into a pack (all technical terms).

Rather than storing files, Git stores commits. Admittedly, commits then store files. What you probably want here is a feature that is being added (however slowly) to Git, where Git can store what they call, internally, a promisor object. These objects aren't transferred into your repository yet. Instead, they just leave behind the URL of the source for the object. Then, as long as you don't actually need the object, you never know for sure whether you can really get it, because you never even try.

My repository doesn't have big files but when I clone it is still very big, probably that's because I used to have big files in repository then I deleted, but somehow it's there.

Again, this is because Git does not store files. It stores commits. Each commit is a full and complete snapshot of all of the files in that commit. If you put in a large file at one point and committed it, that commit has that file.

A commit that says I have file path/to/file.ext as version <ID> means that your Git must have the corresponding object. If not, the repository is damaged and un-clone-able. (With a promisor object, your repository could have the commit but defer copying the file object, replacing it with a promisor.)

A Git repository is little more than a database, or rather, a pair of databases. The big database holds all the Git objects. While there are four types of objects internally, you mostly deal with commit objects. Each has a unique hash ID. The hash ID is how Git finds the object. You'll see these hash IDs all the time, or abbreviated versions of them, in git log output for instance. They are, in a sense, the true names of the commits.

Other objects have hash IDs too, but other objects need not be unique. In particular, if a file in commit A and a file in commit B have the same content—regardless of the two files' names as stored in the two commits—Git will share the object that holds the file's content. Since all objects are read-only, this is quite safe.

In general, when you work with Git, you have Git find a commit for you by branch name. The branch names, and tag names and all other names, are the other database: each name holds one ID. For branch names, the name is constrained to hold only a commit ID, so each branch names one commit.

The way this works is that the branch name holds the hash ID of the latest or last commit in that branch. This last commit holds the hash ID of the previous (used-to-be-latest) commit. That commit holds the hash ID of the next-back commit, and so on.

In other words, Git works backwards. We start at the end, with the latest commit, and work backwards. Each commit is part of the history. The commits, and their backwards-pointing linkages from one commit to the previous, are the history.

If you don't need the earlier commits, you can tell Git, at git clone time, that it should artificially cut off the history after some point. Since history exists from the end backwards, you can choose how many commits to get, in terms of stepping backwards from the last ones, with --depth.

Watch out: --depth implies --single-branch. If you want more than one branch name copied from the source repository, you must defeat the single-branch-ness. However, since branch names are really just there to find commits—and allow you to easily add commits—a lot of the cases that call for a limited --depth also call for --single-branch anyway.

With a full clone, you have the entire history of the project. You can "go back in time" to any point in the past by checking out some specific commit. Find its hash ID, or a name that finds its hash ID—tag names, for instance, are meant for exactly this sort of thing—and tell Git to extract that commit into your work-area, and you now have all the files as of that commit.

score 0 · Answer 2 · answered Mar 31 '20 at 18:40

AFAIK this is not possible, you might try --verbose, and not particularly meaningful. git clone is downloading the entire history of the repository and every version of every file. That's how it's able to operate without having to contact a central server.

Your problem appears to be that sometime back in history you committed some big files, then maybe removed them. But they're still part of history, so the old, deleted files still get cloned. This is pretty common.

You can fix this going forward by putting large files into Git Large File Storage (git-lfs). And you can fix your history by using the BFG Repo Cleaner to put old versions into LFS. Then Git will clone only references to the large files and only download them as necessary.

How to show the specific files being downloaded when using git clone?

2 Answers2

Details