5

I would like to fetch only the commits of branchA not present in its base branchB.

For example, consider this history:

B1 - B2 - B3 - B4 - B5
           \
            A1 - A2 - A3

I would like to fetch only A1, A2 and A3. It's important to note that I don't know up front which commit is A1, and how many commits I need to fetch. My input is just the heads of the two branches, in this example branchA=A3 and branchB=B5. Based on such input I need to identify A1 and fetch everything between A1 and branchA, and ideally nothing more.

Alternatively, fetching a minimal set of commits that include A1, A2 and A3, and enough information to identify A1, can be interesting too.

Why? In a use case where I only need those commits ("what changed in branchA relative to branchB), fetching more than the necessary commits slows down my process. Take for example a large repository with thousands of commits, and feature branches with only a few commits. Fetching the entire history of branchA and branchB fetches a lot of commits I don't need, and takes a lot of time and network bandwidth.

I came up with an ugly hack that avoids fetching the full history, by starting from shallow clones, and incrementally fetching more and more until a common commit is found:

git clone --depth 1 "$repo" --branch "$branchA" shallow
cd shallow

for ((depth = 8; depth <= 1024; depth *= 2)); do
    echo "trying depth $depth ..."
    git fetch --depth $depth
    git fetch --depth $depth origin "$branchB:$branchB"
    lastrev=$(git rev-list --reverse "$branchB" | head -n1)
    if git merge-base --is-ancestor "$lastrev" HEAD; then
        echo "found with depth=$depth"
        break
    fi
done

This works for my use case: it fetches a large enough subset of commits to identify A1 and include commits until the head of branchA, and it's faster than fetching the complete history of the two branches.

Is there a better way than this? I'm looking for a pure Git solution, but if the GitHub API has something to make this faster and easier, that can be interesting too.

janos
  • 120,954
  • 29
  • 226
  • 236
  • I'm not sure I understand the problem. When I do `git fetch origin branchB`, without the `--depth` option, it only gets the commits that are missing from my tree, even if I did a shallow clone in the first place. Is that not the behaviour your observe? – joanis Jan 14 '19 at 15:15
  • Sorry, I just understood the problem: your initial clone didn't include the merge-base, so my test does not apply. – joanis Jan 14 '19 at 15:17
  • @joanis yes, finding the merge-base, without fetching the full history, is an important part of the problem. It might be the key to an efficient solution. Or a better solution. For example, my dirty hack fetches commits of two branches in parallel. If I know the merge-base, I could improve the dirty hack to fetch one branch only. It will still be dirty, but better, by cutting all the fetches in half. – janos Jan 14 '19 at 17:17
  • Unfortunately, I don't know how to figure out the merge-base without logging in directly to the remote or having already fetched it. I like your solution, for what it's worth. – joanis Jan 14 '19 at 21:23

2 Answers2

4

This is not possible today. Variants of your work-around are the best you can do.

There's nothing in the protocol that would prevent you from supplying a raw hash ID, rather than a --depth argument, to git fetch, that would tell git fetch to pretend that the correct --depth (whatever that is) was supplied. But there's also nothing in git fetch to implement this. Hence, the only way to do this is to enumerate commits, one at a time, backwards from each branch tip until you find the correct hash(es), which also tells you what the --depth argument should be for your git fetch command.

However, by the time you have iterated over enough hash IDs to find the correct depth, you could have just done a full clone, in most cases. So there is very little pressure to implement this feature outside Git (e.g., via the GitHub interface). And, naming commits by hash ID is no fun at all for humans either—so there's very little pressure to (or sense in) add this feature to git fetch, either.

The best solution would be one in which you can present to the other Git repository a starting hash (which your own Git can supply by local name-to-hash conversion): if you last saw that the tip of their B branch was, say, B4, so that your own origin/B identifies commit B4, you could, locally, run (note that this proposed --depth-inferred-from argument does not exist today):

git fetch --depth-inferred-from=origin/B A

which would have your Git:

  1. run git ls-remote, or the equivalent that git fetch always runs
  2. convert their refs/heads/A (which you intend to fetch) into a hash ID, denoted H in step 3
  3. ask their Git to enumerate only <hash-of-B4>..H when presenting commits during the have session
  4. drop into the remainder of a normal fetch, i.e., the have/want session for obtaining object IDs to fetch

Step 3, however, requires a new feature in the fetch protocol, so is very much nontrivial.

torek
  • 448,244
  • 59
  • 642
  • 775
1

Solution 1: Use --shallow-exclude=

git clone --shallow-exclude="$branchB" --single-branch --no-tags \
          -b "$branchA" "$repo" shallow
cd shallow
git fetch --shallow-exclude="$branchA" origin "$branchB:$branchB"

# At this point, B3 itself would still be missing,
# so we have to add one more commit into the history of both branches.
git repack -d # Workaround for a bug. https://stackoverflow.com/q/63878612/4967497
git fetch --deepen=1 origin "$branchA" "$branchB"

Unfortunately, if you have merged the two branches at least once, this will not work as expected. Consider the following scenario:

B1 - B2 - B3 - B4 - B5 - B6    branchB (e.g. master/main)
           \         \
            A1 - A2 - A3 - A4  branchA (e.g. your feature branch)

Within the shallow checkout, branchB stops at B5, which means any further command (like merge) would not consider B3 as part of branchB.

Solution 2: Adding base to .git/shallow

If you know the commit hash of B3 ($base), you can do the following:

echo "$base" >> .git/shallow
git fetch -n origin "$branchA:$branchA"

The command git fetch only downloads commits up to the hashes in .git/shallow. Note that if you have merged the branches a few times, you have to add all the commits you have merged into branchB. Consider the following scenario:

       C1 - C2                 some merged branch
      /       \
B1 - B2 - B3 - B4 - B5 - B6    branchB (e.g. master/main)
           \         \
            A1 - A2 - A3 - A4  branchA (e.g. your feature branch)

If you only add B3 into .git/shallow, git fetch would still download B2 and all previous commits since they are still reachable over B5 -> B4 -> C2 -> C1 -> B2

JojOatXGME
  • 3,023
  • 2
  • 25
  • 41