Is this behavior well-defined?
With Git 2.34 (Q4 2021), the revision traversal API has illustrated by two commits:
- one by taking advantage of the commit-graph, when available, to determine if a commit is reachable from any of the existing refs.
- one cancelling/reverting that change
Both illustrates how git-rev-list
works:
See commit f559d6d, commit 809ea28, commit bf9c0cb, commit f45022d (09 Aug 2021), and commit 29ef1f2 (05 Aug 2021) by Patrick Steinhardt (pks-t
).
(Merged by Junio C Hamano -- gitster
-- in commit a5619d4, 03 Sep 2021)
connected
: do not sort input revisions
Signed-off-by: Patrick Steinhardt
In order to compute whether objects reachable from a set of tips are all connected, we do a revision walk with these tips as positive references and --not --all
.
--not --all
will cause the revision walk to load all preexisting references as uninteresting, which can be very expensive in repositories with many references.
Benchmarking the git-rev-list
command highlights that by far the most expensive single phase is initial sorting of the input revisions: after all references have been loaded, we first sort commits by author date.
In a real-world repository with about 2.2 million references, it makes up about 40% of the total runtime of git-rev-list
.
Ultimately, the connectivity check shouldn't really bother about the order of input revisions at all.
We only care whether we can actually walk all objects until we hit the cut-off point.
So sorting the input is a complete waste of time.
Introduce a new "--unsorted-input
" flag to git-rev-list
which will cause it to not sort the commits and adjust the connectivity check to always pass the flag.
This results in the following speedups, executed in a clone of gitlab-org/gitlab
:
Benchmark #1: git rev-list --objects --quiet --not --all --not $(cat newrev)
Time (mean ± σ): 7.639 s ± 0.065 s [User: 7.304 s, System: 0.335 s]
Range (min … max): 7.543 s … 7.742 s 10 runs
Benchmark #2: git rev-list --unsorted-input --objects --quiet --not --all --not $newrev
Time (mean ± σ): 4.995 s ± 0.044 s [User: 4.657 s, System: 0.337 s]
Range (min … max): 4.909 s … 5.048 s 10 runs
Summary
'git rev-list --unsorted-input --objects --quiet --not --all --not $(cat newrev)' ran
1.53 ± 0.02 times faster than 'git rev-list --objects --quiet --not --all --not $newrev'
Note that not all refs are visible to clients.
rev-list-options
now includes in its man page:
--unsorted-input
Show commits in the order they were given on the command line instead
of sorting them in reverse chronological order by commit time. Cannot
be combined with --no-walk
or --no-walk=sorted
.
rev-list-options
now includes in its man page:
Cannot be combined with --graph
. Cannot be combined with
--unsorted-input
if sorted
or no argument was given.
And with the same Git 2.34 (Q4 2021), a regression fix reverts the optimization above:
See commit a7df4f5 (11 Nov 2021) by Junio C Hamano (gitster
).
(Merged by Junio C Hamano -- gitster
-- in commit 8996d68, 12 Nov 2021)
This reverts commit f45022d (connected
: do not sort input revisions, 2021-08-09, Git v2.34.0-rc0 -- merge listed in batch #3), as this is like breakage in the traversal more likely.
In a history with 10 single strand of pearls,
1-->2-->3--...->7-->8-->9-->10
asking "rev-list --unsorted-input 1 10 --not 9 8 7 6 5 4
(man) fails to paint the bottom 1 uninteresting as the traversal stops, without completing the propagation of uninteresting bit starting at 4 down through 3 and 2 to 1.
2018:
Git 2.16 (Q1 2018) will allow git describe
to give an object a human readable name based on an available ref when used as git describe <blob>
.
(See more at "Which commit has this blob?")
In that context, git rev-list adds a new order.
See commit ce5b6f9 by Stefan Beller (stefanbeller
).
revision.h
: introduce blob/tree walking in order of the commits
The functionality to list tree objects in the order they were seen
while traversing the commits will be used in one of the next commits,
where we teach git describe
to describe not only commits, but blobs, too.
That means the git rev-list
man page has a new object traversal order:
--in-commit-order::
Print tree and blob ids in order of the commits.
The tree and blob ids are printed after they are first referenced by a commit.