The way this works internally is that git push
runs git pack-objects
(with --thin
), which then runs git rev-list
, passing it the commit IDs you've asked to push
This object-finding can be optimized via bitmaps.
Well, not since With Git 2.4.7 (Q3 2015)
See commit c8a70d3 (01 Jul 2015) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit ace6325, 10 Jul 2015)
rev-list
: disable --use-bitmap-index
when pruning commits
Signed-off-by: Jeff King
The reachability bitmaps do not have enough information to tell us which commits might have changed path "foo", so the current code produces wrong answers for:
git rev-list --use-bitmap-index --count HEAD -- foo
(it silently ignores the "foo
" limiter).
Instead, we should fall back to doing a normal traversal (it is OK to fall back rather than complain, because --use-bitmap-index
is a pure optimization, and might not kick in for other reasons, such as there being no bitmaps in the repository).
This has been noted in Git 2.26 (Q1 2020): The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal.
There however are some cases where they can work together, and they were taught about them.
See commit 20a5fd8 (18 Feb 2020) by Junio C Hamano (gitster
).
See commit 3ab3185, commit 84243da, commit 4f3bd56, commit cc4aa28, commit 2aaeb9a, commit 6663ae0, commit 4eb707e, commit ea047a8, commit 608d9c9, commit 55cb10f, commit 792f811, commit d90fe06 (14 Feb 2020), and commit e03f928, commit acac50d, commit 551cf8b (13 Feb 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 0df82d9, 02 Mar 2020)
pack-bitmap
: refuse to do a bitmap traversal with pathspecs
Signed-off-by: Jeff King
rev-list has refused to use bitmaps with pathspec limiting since c8a70d3509 ("rev-list
: disable --use-bitmap-index when pruning commits", 2015-07-01, Git v2.5.0-rc2 -- merge).
But this is true not just for rev-list, but for anyone who calls prepare_bitmap_walk()
; the code isn't equipped to handle this case.
We never noticed because the only other callers would never pass a pathspec limiter.
But let's push the check down into prepare_bitmap_walk()
anyway. That's a more logical place for it to live, as callers shouldn't need to know the details (and must be prepared to fall back to a regular traversal anyway, since there might not be bitmaps in the repository).
It would also prepare us for a day where this case _is
_ handled, but that's pretty unlikely. E.g., we could use bitmaps to generate the set of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks the commits.
But you'd probably do better still to make use of the newer commit-graph feature to make walking the commits very cheap.
With Git 2.27 (Q2 2020), the object walk with object filter "--filter=tree:0
" can now take advantage of the pack bitmap when available.
See commit 9639474, commit 5bf7f1e (04 May 2020) by Jeff King (peff
).
See commit b0a8d48, commit 856e12c (04 May 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 69ae8ff, 13 May 2020)
pack-bitmap.c
: make object filtering functions generic
Signed-off-by: Taylor Blau
In 4f3bd5606a ("pack-bitmap
: implement BLOB_NONE
filtering", 2020-02-14, Git v2.26.0-rc0 -- merge listed in batch #8), filtering support for bitmaps was added for the 'LOFC_BLOB_NONE
' filter.
In the future, we would like to add support for filters that behave as if they exclude a certain type of object, for e.g., the tree depth filter with depth 0.
To prepare for this, make some of the functions used for filtering more generic, such as 'find_tip_blobs
' and 'filter_bitmap_blob_none
' so that they can work over arbitrary object types.
To that end, create 'find_tip_objects
' and 'filter_bitmap_exclude_type
', and redefine the aforementioned functions in terms of those.
With Git 2.32 (Q2 2021), optimize "rev-list
--use-bitmap-index --objects
(man) corner case that uses negative tags as the stopping points.
That participates to describe what gets "cloned" and "pushed" during git clone
and git push
, paying this time attention to tags:
See commit 540cdc1 (22 Mar 2021) by Patrick Steinhardt (pks-t
).
(Merged by Junio C Hamano -- gitster
-- in commit 58840e6, 07 Apr 2021)
pack-bitmap
: avoid traversal of objects referenced by uninteresting tag
Signed-off-by: Patrick Steinhardt ps@pks.im.
When preparing the bitmap walk, we first establish the set of of have and want objects by iterating over the set of pending objects: if an object is marked as uninteresting, it's declared as an object we already have, otherwise as an object we want.
These two sets are then used to compute which transitively referenced objects we need to obtain.
One special case here are tag objects: when a tag is requested, we resolve it to its first not-tag object and add both resolved objects as well as the tag itself into either the have or want set.
Given that the uninteresting-property always propagates to referenced objects, it is clear that if the tag is uninteresting, so are its children and vice versa.
But we fail to propagate the flag, which effectively means that referenced objects will always be interesting except for the case where they have already been marked as uninteresting explicitly.
This mislabeling does not impact correctness: we now have it in our "wants" set, and given that we later do an AND NOT
of the bitmaps of "wants" and "haves" sets it is clear that the result must be the same.
But we now start to needlessly traverse the tag's referenced objects in case it is uninteresting, even though we know that each referenced object will be uninteresting anyway.
In the worst case, this can lead to a complete graph walk just to establish that we do not care for any object.
Fix the issue by propagating the UNINTERESTING
flag to pointees of tag objects and add a benchmark with negative revisions to p5310.
This shows some nice performance benefits, tested with linux.git
:
Test HEAD~ HEAD
---------------------------------------------------------------------------------------------------------------
5310.3: repack to disk 193.18(181.46+16.42) 194.61(183.41+15.83) +0.7%
5310.4: simulated clone 25.93(24.88+1.05) 25.81(24.73+1.08) -0.5%
5310.5: simulated fetch 2.64(5.30+0.69) 2.59(5.16+0.65) -1.9%
5310.6: pack to file (bitmap) 58.75(57.56+6.30) 58.29(57.61+5.73) -0.8%
5310.7: rev-list (commits) 1.45(1.18+0.26) 1.46(1.22+0.24) +0.7%
5310.8: rev-list (objects) 15.35(14.22+1.13) 15.30(14.23+1.07) -0.3%
5310.9: rev-list with tag negated via --not --all (objects) 22.49(20.93+1.56) 0.11(0.09+0.01) -99.5%
5310.10: rev-list with negative tag (objects) 0.61(0.44+0.16) 0.51(0.35+0.16) -16.4%
5310.11: rev-list count with blob:none 12.15(11.19+0.96) 12.18(11.19+0.99) +0.2%
5310.12: rev-list count with blob:limit=1k 17.77(15.71+2.06) 17.75(15.63+2.12) -0.1%
5310.13: rev-list count with tree:0 1.69(1.31+0.38) 1.68(1.28+0.39) -0.6%
5310.14: simulated partial clone 20.14(19.15+0.98) 19.98(18.93+1.05) -0.8%
5310.16: clone (partial bitmap) 12.78(13.89+1.07) 12.72(13.99+1.01) -0.5%
5310.17: pack to file (partial bitmap) 42.07(45.44+2.72) 41.44(44.66+2.80) -1.5%
5310.18: rev-list with tree filter (partial bitmap) 0.44(0.29+0.15) 0.46(0.32+0.14) +4.5%
While most benchmarks are probably in the range of noise, the newly added 5310.9 and 5310.10 benchmarks consistently perform better.
With Git 2.32 (Q2 2021), a configuration variable has been added to force tips of certain refs to be given a reachability bitmap.
See commit 3f267a1, commit 483fa7f, commit dff5e49 (31 Mar 2021) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 0623669, 13 Apr 2021)
Signed-off-by: Taylor Blau
Add a new 'bitmap' test-tool which can be used to list the commits that have received bitmaps.
In theory, a determined tester could run 'git rev-list --test-bitmap <commit>
'(man) to check if '<commit>
' received a bitmap or not, since '--test-bitmap
' exits with a non-zero code when it can't find the requested commit.
But this is a dubious behavior to rely on, since arguably 'git rev-list
' could continue its object walk outside of which commits are covered by bitmaps.
This will be used to test the behavior of 'pack.preferBitmapTips
'
And:
Suggested-by: Jeff King
Signed-off-by: Taylor Blau
When writing a new pack with a bitmap, it is sometimes convenient to indicate some reference prefixes which should receive priority when selecting which commits to receive bitmaps.
A truly motivated caller could accomplish this by setting 'pack.islandCore
', (since all commits in the core island are similarly marked as preferred) but this requires callers to opt into using delta islands, which they may or may not want to do.
Introduce a new multi-valued configuration, 'pack.preferBitmapTips
' to allow callers to specify a list of reference prefixes.
All references which have a prefix contained in 'pack.preferBitmapTips
' will mark their tips as "preferred" in the same way as commits are marked as preferred for selection by 'pack.islandCore
'.
The choice of the verb "prefer
" is intentional: marking the NEEDS_BITMAP
flag on an object does not guarantee that that object will receive a bitmap.
It merely guarantees that that commit will receive a bitmap over any other commit in the same window by bitmap_writer_select_commits()
.
The test this patch adds reflects this quirk, too.
It only tests that a commit (which didn't receive bitmaps by default) is selected for bitmaps after changing the value of 'pack.preferBitmapTips
' to include it.
Other commits may lose their bitmaps as a byproduct of how the selection process works (bitmap_writer_select_commits()
ignores the remainder of a window after seeing a commit with the NEEDS_BITMAP
flag).
This configuration will aide in selecting important references for multi-pack bitmaps, since they do not respect the same pack.islandCore
configuration.
(They could, but doing so may be confusing, since it is packs--not bitmaps--which are influenced by the delta-islands configuration).
In a fork network repository (one which lists all forks of a given repository as remotes), for example, it is useful to set pack.preferBitmapTips
to 'refs/remotes/<root>/heads
' and 'refs/remotes/<root>/tags
', where '<root>
' is an opaque identifier referring to the repository which is at the base of the fork chain.
git config
now includes in its man page:
pack.preferBitmapTips
When selecting which commits will receive bitmaps, prefer a
commit at the tip of any reference that is a suffix of any value
of this configuration over any other commits in the "selection
window".
Note that setting this configuration to refs/foo
does not mean that
the commits at the tips of refs/foo/bar
and refs/foo/baz
will
necessarily be selected. This is because commits are selected for
bitmaps from within a series of windows of variable length.
If a commit at the tip of any reference which is a suffix of any value
of this configuration is seen in a window, it is immediately given
preference over any other commit in that window.
With Git 2.33 (Q3 2021), avoid duplicated work while building reachability bitmaps.
See commit aa9ad6f (14 Jun 2021) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 1ef488e, 08 Jul 2021)
bitmaps
: don't recurse into trees already in the bitmap
Signed-off-by: Jeff King
If an object is already mentioned in a reachability bitmap we are building, then by definition so are all of the objects it can reach.
We have an optimization to stop traversing commits when we see they are already in the bitmap, but we don't do the same for trees.
It's generally unavoidable to recurse into trees for commits not yet covered by bitmaps (since most commits generally do have unique top-level trees).
But they usually have subtrees that are shared with other commits (i.e., all of the subtrees the commit _didn
't_ touch).
And some of those commits (and their trees) may be covered by the bitmap.
Usually this isn't too big a deal, because we'll visit those subtrees only once in total for the whole walk.
But if you have a large number of unbitmapped commits, and if your tree is big, then you may end up opening a lot of sub-trees for no good reason.
We can use the same optimization we do for commits here: when we are about to open a tree, see if it's in the bitmap (either the one we are building, or the "seen" bitmap which covers the UNINTERESTING side of the bitmap when doing a set-difference).
This works especially well because we'll visit all commits before hitting any trees.
So even in a history like:
A -- B
if "A" has a bitmap on disk but "B" doesn't, we'll already have OR-ed in the results from A before looking at B's tree (so we really will only look at trees touched by B).
For most repositories, the timings produced by p5310 are unspectacular.
Any improvement there is within the noise (the +3.1% on test 7 has to be noise, since we are not recursing into trees, and thus the new code isn't even run).
The results for git.git are likewise uninteresting.
But here are numbers from some other real-world repositories (that are not public).
This one's tree is comparable in size to linux.git, but has ~16k refs (and so less complete bitmap coverage):
Test HEAD^ HEAD
-------------------------------------------------------------------------
5310.4: simulated clone 38.34(39.86+0.74) 33.95(35.53+0.76) -11.5%
5310.5: simulated fetch 2.29(6.31+0.35) 2.20(5.97+0.41) -3.9%
5310.7: rev-list (commits) 0.99(0.86+0.13) 0.96(0.85+0.11) -3.0%
5310.8: rev-list (objects) 11.32(11.04+0.27) 6.59(6.37+0.21) -41.8%
And here's another with a very large tree (~340k entries), and a fairly large number of refs (~10k):
Test HEAD^ HEAD
-------------------------------------------------------------------------
5310.3: simulated clone 53.83(54.71+1.54) 39.77(40.76+1.50) -26.1%
5310.4: simulated fetch 19.91(20.11+0.56) 19.79(19.98+0.67) -0.6%
5310.6: rev-list (commits) 0.54(0.44+0.11) 0.51(0.43+0.07) -5.6%
5310.7: rev-list (objects) 24.32(23.59+0.73) 9.85(9.49+0.36) -59.5%
This patch provides substantial improvements in these larger cases, and have any drawbacks for smaller ones (the cost of the bitmap check is quite small compared to an actual tree traversal).
And still Git 2.33: with A race between repacking and using pack bitmaps has been corrected with Git 2.33 (Q3 2021).
See commit dc1daac (23 Jul 2021) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit 9bcdaab, 02 Aug 2021)
pack-bitmap
: check pack validity when opening bitmap
Signed-off-by: Jeff King
When pack-objects adds an entry to its list of objects to pack, it may mark the packfile and offset that contains the file, which we can later use to output the object verbatim.
If the packfile is deleted while we are running (e.g., by another process running "git repack
"(man)), we may die in use_pack()
if the pack file cannot be opened.
We worked around this in 4c08018 ("pack-objects
: protect against disappearing packs", 2011-10-14, Git v1.7.8-rc0 -- merge) by making sure we can open the pack before recording it as a source.
This detects a pack which has already disappeared while generating the packing list, and because we keep the pack's file descriptor (or an mmap window) open, it means we can access it later (unless you exceed core.packedgitlimit).
The bitmap code that was added later does not do this; it adds entries to the packlist without checking that the packfile is still valid, and is vulnerable to this race.
It needs the same treatment as 4c08018.
However, rather than add it in just that one spot, it makes more sense to simply open and check the packfile when we open the bitmap.
Technically you can use the .bitmap without even looking in the .pack file (e.g., if you are just printing a list of objects without accessing them), but it's much simpler to do it early.
That covers all later direct uses of the pack (due to the cached descriptor) without having to check each one directly.
For example, in pack-objects we need to protect the packlist entries, but we also access the pack directly as part of the reuse_partial_pack_from_bitmap()
feature.
This patch covers both cases.