0

Since my work needs, I need to download only the commit object in ./git/objects from the remote git repository. /.git/ metadata contains some useful information. But in the pack file include the commit/tree/blob object. the tree/blob objects are too large, so the tree/blob object Should not be download. Maybe I need to rebuild the git client or write a python script.

I saw some projects (like https://github.com/lijiejie/GitHack).It downloading the files step by step by parsing the index file, and finaly the commit/tree/blob objects are separated after the download.

enter image description here

But this method need to access the .git folder via HTTP, and you can't download the index file directly from github.

I don't understand how github communicates with the git client.

I tried to look at the underlying source code for C(https://github.com/git/git) and Java(https://github.com/eclipse/jgit), but the complicated structure puzzled me.

Is there any way to implement the function of downloading only the commit object? Or how can I help myself find the key code in the underlying source code.

If you know how to do this, please let me know. thank you very much.

PS: I need to download somefile of .git from github/gitlab(public Repositories) . But --filter=blob:none in github recive warning: filtering not recognized by server, ignoring. And github's api have strong limits.

Ph0rse
  • 93
  • 6
  • any use of so called 'commit object'? – Lei Yang Mar 14 '19 at 14:24
  • commit contains commit information. such as: `git cat-file -p 171a2e1` . this command can show the information in commit object. – Ph0rse Mar 14 '19 at 14:34
  • you setup a webserver and it can run git commands locally(on the git server), so your client can get the logs, no need download anything. isn't that more lightweight? – Lei Yang Mar 14 '19 at 14:35
  • thanks for your suggest. but I need to do some analysis work in github/gitlab. So, it needs in public net and Fast enough. – Ph0rse Mar 14 '19 at 14:40
  • The complete clone time is too long when I have to process the information on the whole network. And github's api will limit the number of visits. – Ph0rse Mar 14 '19 at 14:43
  • FWIW, Github uses [rugged](https://github.com/libgit2/rugged) to interface with Git repositories. – coreyward Mar 14 '19 at 14:57
  • Possible duplicate of [What is the git clone --filter option's syntax?](https://stackoverflow.com/questions/49917616/what-is-the-git-clone-filter-options-syntax) – phd Mar 14 '19 at 15:28
  • `git clone --filter=blob:none` – phd Mar 14 '19 at 15:30
  • When i try to execute the command: `git clone --no-checkout --filter=blob:none "https://github.com/vulhub/vulhub.git" pc2`. It show me that `warning: filtering not recognized by server, ignoring` – Ph0rse Mar 15 '19 at 02:40
  • Unfortunately, github does not support the --filter feature. For more information :https://stackoverflow.com/questions/55195590/do-github-and-gitlab-support-git-clones-filter-parameter – Ph0rse Mar 21 '19 at 13:49

0 Answers0