For now, consider
using --depth
or --shallow-since
when cloning, and/or
making one clone and keeping it, and just using git fetch
to update it.
The former produces a deliberately truncated clone, which has some limitations (but in modern Git, not that many and hence can still be useful). The latter is usually the way to go. Clone once, then update: updates are fast!
Details
You can't quite get what you want here. As wjandrea said in a comment, Git doesn't exactly store files. At this point in a clone, your Git is copying objects that have been compressed into a pack (all technical terms).
Rather than storing files, Git stores commits. Admittedly, commits then store files. What you probably want here is a feature that is being added (however slowly) to Git, where Git can store what they call, internally, a promisor object. These objects aren't transferred into your repository yet. Instead, they just leave behind the URL of the source for the object. Then, as long as you don't actually need the object, you never know for sure whether you can really get it, because you never even try.
My repository doesn't have big files but when I clone it is still very big, probably that's because I used to have big files in repository then I deleted, but somehow it's there.
Again, this is because Git does not store files. It stores commits. Each commit is a full and complete snapshot of all of the files in that commit. If you put in a large file at one point and committed it, that commit has that file.
A commit that says I have file path/to/file.ext
as version <ID>
means that your Git must have the corresponding object. If not, the repository is damaged and un-clone-able. (With a promisor object, your repository could have the commit but defer copying the file object, replacing it with a promisor.)
A Git repository is little more than a database, or rather, a pair of databases. The big database holds all the Git objects. While there are four types of objects internally, you mostly deal with commit objects. Each has a unique hash ID. The hash ID is how Git finds the object. You'll see these hash IDs all the time, or abbreviated versions of them, in git log
output for instance. They are, in a sense, the true names of the commits.
Other objects have hash IDs too, but other objects need not be unique. In particular, if a file in commit A and a file in commit B have the same content—regardless of the two files' names as stored in the two commits—Git will share the object that holds the file's content. Since all objects are read-only, this is quite safe.
In general, when you work with Git, you have Git find a commit for you by branch name. The branch names, and tag names and all other names, are the other database: each name holds one ID. For branch names, the name is constrained to hold only a commit ID, so each branch names one commit.
The way this works is that the branch name holds the hash ID of the latest or last commit in that branch. This last commit holds the hash ID of the previous (used-to-be-latest) commit. That commit holds the hash ID of the next-back commit, and so on.
In other words, Git works backwards. We start at the end, with the latest commit, and work backwards. Each commit is part of the history. The commits, and their backwards-pointing linkages from one commit to the previous, are the history.
If you don't need the earlier commits, you can tell Git, at git clone
time, that it should artificially cut off the history after some point. Since history exists from the end backwards, you can choose how many commits to get, in terms of stepping backwards from the last ones, with --depth
.
Watch out: --depth
implies --single-branch
. If you want more than one branch name copied from the source repository, you must defeat the single-branch-ness. However, since branch names are really just there to find commits—and allow you to easily add commits—a lot of the cases that call for a limited --depth
also call for --single-branch
anyway.
With a full clone, you have the entire history of the project. You can "go back in time" to any point in the past by checking out some specific commit. Find its hash ID, or a name that finds its hash ID—tag names, for instance, are meant for exactly this sort of thing—and tell Git to extract that commit into your work-area, and you now have all the files as of that commit.