For comparison: The Linux repository, which is the largest git repository I know of, has almost 470k commits and more than 4k contributors. It took 1.15 GB at checkout. After a git gc --aggressive
its size went down to 858 MB.
You certainly have files in your repository that don't belong there. I'm primarily thinking of various binary files. These should be stored elsewhere if they take too much space.
If you happen to store compiled files, you should remove them from the repository and add the corresponding patterns to your .gitignore
file. As a rule of thumb, files that can be generated from other files in the repository and that take space or are binary files shouldn't be commited.
I just found this tool: BFG Repo-cleaner. It's a helper tool that lets you rewrite your history with removing problematic files. You could use it to remove the files that don't belong there.
Take care though, rewriting history means most commits will get a different SHA-1 hash. So everyone on your team would have to switch repositories at the same time: you generate the new repo, and then everyone will have to abandon the old repo and use the new one from now on.
But: cloning a repository shouldn't be problematic in the first place. You are supposed to clone a repository only once. If you need a second repository for whatever reason, clone it from the first one or just copy the .git
directory from it.
Likewise, the remote people could have cloned the repository only once (so you transfer these 4.5 GB only once between Germany and China). Then, the Chinese people can clone it locally between themselves and just switch the upsteam remote afterwards.
In conclusion, I don't know if cleaning the repository is worth it in the first place, since you're not supposed to clone it very often.