Another option, with Git 2.41+ (Q2 2023), for all branches/tags:
git for-each-ref --format="%(refname)" "refs/heads/*" "refs/tags/*" >allrefs &&
sort -r allrefs | head -n 50 >refs
git for-each-ref --format="%(ahead-behind:HEAD)" --stdin <refs
With Git 2.41 (Q2 2023), "git for-each-ref
"(man) learns '%(ahead-behind:<base>)
' that computes the distances from a single reference point in the history with bunch of commits in bulk.
It shows two new features.
First: git for-each-ref --stdin <refs
See commit cbfe360, commit 49abcd2, commit fd67d14, commit 2ee11f7, commit 80c928d, commit 368d19b, commit b2c51b7, commit b73dec5 (20 Mar 2023) by Derrick Stolee (derrickstolee
).
See commit c08645b (20 Mar 2023) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 7727da9, 06 Apr 2023)
Helped-by: Phillip Wood
Signed-off-by: Derrick Stolee
When a user wishes to input a large list of patterns to 'git for-each-ref
'(man) (likely a long list of exact refs) there are frequently system limits on the number of command-line arguments.
Add a new --stdin
option to instead read the patterns from standard input.
Add tests that check that any unrecognized arguments are considered an error when --stdin
is provided.
Also, an empty pattern list is interpreted as the complete ref set.
When reading from stdin, we populate the filter.name_patterns array dynamically as opposed to pointing to the 'argv
' array directly.
This is simple when using a strvec, as it is NULL-terminated in the same way.
We then free the memory directly from the strvec.
git for-each-ref
now includes in its man page:
[(--sort=)...] [--format=]
[ --stdin | ... ]
git for-each-ref
now includes in its man page:
--stdin
If --stdin
is supplied, then the list of patterns is read from
standard input instead of from the argument list.
Second: git for-each-ref --format="%(ahead-behind:HEAD)"
for-each-ref
: add ahead-behind format atom
Signed-off-by: Derrick Stolee
The previous change implemented the ahead_behind()
method, including an algorithm to compute the ahead/behind values for a number of commit tips relative to a number of commit bases.
Now, integrate that algorithm as part of 'git for-each-ref
'(man) hidden behind a new format atom, ahead-behind.
This naturally extends to 'git branch
'(man) and 'git tag
'(man) builtins, as well.
This format allows specifying multiple bases, if so desired, and all matching references are compared against all of those bases.
For this reason, failing to read a reference provided from these atoms results in an error.
In order to translate the ahead_behind()
method information to the format output code in ref-filter.c
, we must populate arrays of ahead_behind_count
structs.
In struct ref_array,
we store the full array that will be passed to ahead_behind()
.
In struct ref_array_item,
we store an array of pointers that point to the relvant items within the full array.
In this way, we can pull all relevant ahead/behind values directly when formatting output for a specific item.
It also ensures the lifetime of the ahead_behind_count
structs matches the time that the array is being used.
Also add performance tests in a new p1300-graph-walks.sh script.
This will be useful for more uses in the future, but for now compare the ahead-behind counting algorithm in 'git for-each-ref
' to the naive implementation by running 'git rev-list --count
'(man) processes for each input.
git for-each-ref
now includes in its man page:
ahead-behind:<committish>
Two integers, separated by a space, demonstrating the number of
commits ahead and behind, respectively, when comparing the output
ref to the <committish>
specified in the format.
It also illustrates/explains how ahead/behinf is computed
commit-reach
: implement ahead_behind()
logic
Co-authored-by: Taylor Blau
Signed-off-by: Taylor Blau
Signed-off-by: Derrick Stolee
Fully implement the commit-counting logic required to determine ahead/behind counts for a batch of commit pairs.
This is a new library method within commit-reach.h
.
The interface for ahead_behind()
uses two arrays.
- The first array of commits contains the list of all starting points for the walk.
This includes all tip commits and base commits.
- The second array specifies base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each of these pairs.
This implementation of ahead_behind()
allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for counting the ahead/behind values, saving time when the base/tip ranges overlap significantly.
This interface for ahead_behind()
also makes it very easy to call ensure_generations_valid()
on the entire array of bases and tips.
This call is necessary because it is critical that the walk that counts ahead/behind values never walks a commit more than once.
Without generation numbers on every commit, there is a possibility that a commit date skew could cause the walk to revisit a commit and then double-count it.
For this reason, it is strongly recommended that git ahead-behind
is only run in a repository with a commit-graph file that covers most of the reachable commits, storing precomputed generation numbers.
If no commit-graph exists, this walk will be much slower as it must walk all reachable commits in ensure_generations_valid()
before performing the counting logic.
It is possible to detect if generation numbers are available at run time and redirect the implementation to another algorithm that does not require this property.
However, that implementation requires a commit walk per base/tip pair and can be slower due to the commit date heuristics required.
Such an implementation could be considered in the future if there is a reason to include it, but most Git hosts should already be generating a commit-graph file as part of repository maintenance.
Most Git clients should also be generating commit-graph files as part of background maintenance or automatic GCs.
Now, let's discuss the ahead/behind counting algorithm.
The first array of commits are considered the starting commits.
The index within that array will play a critical role.
We create a new commit slab that maps commits to a bitmap.
For a given commit (anywhere in the history), its bitmap stores information relative to which of the input commits can reach that commit.
The ith bit will be on if the ith commit from the starting list can reach that commit.
It is important to notice that these bitmaps are not the typical "reachability bitmaps" that are stored in .bitmap
files.
Instead of signalling which objects are reachable from the current commit, they instead signal "which starting commits can reach me?" It is also important to know that the bitmap is not necessarily "complete" until we walk that commit.
We will perform a commit walk by generation number in such a way that we can guarantee the bitmap is correct when we visit that commit.
At the beginning of the ahead_behind()
method, we initialize the bitmaps for each of the starting commits.
By enabling the ith bit for the ith starting commit, we signal "the ith commit can reach itself."
We walk commits by popping the commit with maximum generation number out of the queue, guaranteeing that we will never walk a child of that commit in any future steps.
As we walk, we load the bitmap for the current commit and perform two main steps.
The second step examines each parent of the current commit and adds the current commit's bitmap bits to each parent's bitmap.
(We create a new bitmap for the parent if this is our first time seeing that parent.) After adding the bits to the parent's bitmap, the parent is added to the walk queue.
Due to this passing of bits to parents, the current commit has a guarantee that the ith bit is enabled on its bitmap if and only if the ith commit can reach the current commit.
The first step of the walk is to examine the bitmask on the current commit and decide which ranges the commit is in or not.
Due to the "bit pushing" in the second step, we have a guarantee that the ith bit of the current commit's bitmap is on if and only if the ith starting commit can reach it.
For each ahead_behind_count
struct, check the base_index
and tip_index
to see if those bits are enabled on the current bitmap.
If exactly one bit is enabled, then increment the corresponding 'ahead' or 'behind' count.
This increment is the reason we absolutely need to walk commits at most once.
The only subtle thing to do with this walk is to check to see if a parent has all bits on in its bitmap, in which case it becomes "stale" and is marked with the STALE bit.
This allows queue_has_nonstale()
to be the terminating condition of the walk, which greatly reduces the number of commits walked if all of the commits are nearby in history.
It avoids walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup()
to add commits to the priority queue without adding them multiple times.
This uses the PARENT2 flag.
Thus, we must clear both the STALE and PARENT2 bits of all commits, in case ahead_behind()
is called multiple times in the same process.