4

We can use git ls-remote command to get the list of commits of a given git repo e.g. this discussion

This command takes us quite a long time for the repo that has big list of commits/tags - we are looking for a parameter for it and/or any other command to limit the number of returned rows e.g. only read the first 3 rows in the returned list.

How can we get to that?

p.s.

My google search and our site search is little helpful.

Nam G VU
  • 33,193
  • 69
  • 233
  • 372
  • After posting [my answer](https://stackoverflow.com/a/45253629/6394138) I figured out that if you are comfortable with receiving an incomplete list, then maybe you are trying to solve some problem you don't tell us about with the wrong tool. Can you tell exactly what are you trying to achieve? – Leon Jul 22 '17 at 10:49
  • One usecase @Leon, we want to get the latest commit/tag row as fast as we can – Nam G VU Jul 22 '17 at 10:52
  • Do you mean the *latest commit on a given branch*? – Leon Jul 22 '17 at 11:00
  • Right @Leon in the fastest time – Nam G VU Jul 22 '17 at 13:37
  • 2
    I thought that `git fetch --dry-run` *`remote branch`* might work for you, but studying the code revealed that it uses `transport_get_remote_refs()` (i.e. the core of `git ls-remote`) underneath. – Leon Jul 22 '17 at 15:32
  • You are really really good in git knowledge. Thanks a lot. – Nam G VU Jul 22 '17 at 16:00
  • See updated answer. Bottom line is that it is possible only if the remote supports the dumb protocol. – Leon Jul 22 '17 at 17:30
  • I'm using github repo. How to verify if dump protocol is supported or not? – Nam G VU Jul 23 '17 at 02:22
  • GitHub doesn't support the dumb protocol, but instead it provides a rich API, with possibility to [Get a Reference](https://developer.github.com/v3/git/refs/#get-a-reference), for example: `curl -s https://api.github.com/repos/dictcp/awesome-git/git/refs/heads/master|grep '^ *"sha":'` – Leon Jul 23 '17 at 06:48
  • Great thanks! Absolutely I'll try github API someday - currently just accept the slow speed over here then. – Nam G VU Jul 23 '17 at 18:39

1 Answers1

6

git ls-remote works by fetching the full list of refs from the remote and then filtering it locally:

int cmd_ls_remote(int argc, const char **argv, const char *prefix)
{
   ...

    transport = transport_get(remote, NULL);
    if (uploadpack != NULL)
        transport_set_option(transport, TRANS_OPT_UPLOADPACK, uploadpack);

/* Get all refs from the remote */
    ref = transport_get_remote_refs(transport);
    if (transport_disconnect(transport))
        return 1;

    if (!dest && !quiet)
        fprintf(stderr, "From %s\n", *remote->url);

/* Filter the list of all refs */
    for ( ; ref; ref = ref->next) {
        if (!check_ref_type(ref, flags))
            continue;
        if (!tail_match(pattern, ref->name))
            continue;
        if (show_symref_target && ref->symref)
            printf("ref: %s\t%s\n", ref->symref, ref->name);
        printf("%s\t%s\n", oid_to_hex(&ref->old_oid), ref->name);
        status = 0; /* we found something */
    }
    return status;
}

UPDATE

According to this page, explaining git transfer protocols, if the remote supports the dumb protocol, then you can obtain the remote refs/heads/branchname file directly (e.g. using curl).

The Dumb Protocol

If you’re setting up a repository to be served read-only over HTTP, the dumb protocol is likely what will be used. This protocol is called “dumb” because it requires no Git-specific code on the server side during the transport process; the fetch process is a series of HTTP GET requests, where the client can assume the layout of the Git repository on the server.

...

Otherwise, when the smart protocol is utilized, the first piece of data sent from a remote is always the list of all remote references, i.e. any git command connecting to a remote via the smart protocol acts as if git ls-remote is run internally (more technically, all such commands call the transport_get_remote_refs() function). In that case, unfortunately, there is no way to speed-up your query, not even a workaround.

The Smart Protocol

...

Uploading Data

To upload data to a remote process, Git uses the send-pack and receive-pack processes. The send-pack process runs on the client and connects to a receive-pack process on the remote side.

...

The git-receive-pack command immediately responds with one line for each reference it currently has.

...

Downloading Data

When you download data, the fetch-pack and upload-pack processes are involved. The client initiates a fetch-pack process that connects to an upload-pack process on the remote side to negotiate what data will be transferred down.

...

After fetch-pack connects, upload-pack sends back something like this: ... This is very similar to what receive-pack responds with, but the capabilities are different (i.e. the list of all refs + some additional data).

...

Leon
  • 31,443
  • 4
  • 72
  • 97