Note: Git 2.18 (Q2 2018) does now pre-compute and store information necessary for ancestry traversal in a separate file to optimize graph walking.
That notion of commits graph does change how 'git log --graph
' does work.
As mentioned here:
git config --global core.commitGraph true
git config --global gc.writeCommitGraph true
cd /path/to/repo
git commit-graph write
See commit 7547b95, commit 3d5df01, commit 049d51a, commit 177722b, commit 4f2542b, commit 1b70dfd, commit 2a2e32b (10 Apr 2018), and commit f237c8b, commit 08fd81c, commit 4ce58ee, commit ae30d7b, commit b84f767, commit cfe8321, commit f2af9f5 (02 Apr 2018) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit b10edb2, 08 May 2018)
You now have the command git commit-graph
: Write and verify Git commit graph files.
Write a commit graph file based on the commits found in packfiles.
Includes all commits from the existing commit graph file.
The design document states:
Git walks the commit graph for many reasons, including:
- Listing and filtering commit history.
- Computing merge bases.
These operations can become slow as the commit count grows. The merge
base calculation shows up in many user-facing commands, such as 'merge-base'
or 'status' and can take minutes to compute depending on history shape.
There are two main costs here:
- Decompressing and parsing commits.
- Walking the entire graph to satisfy topological order constraints.
The commit graph file is a supplemental data structure that accelerates
commit graph walks.
If a user downgrades or disables the 'core.commitGraph
' config setting, then the existing ODB is sufficient.
The file is stored as "commit-graph
" either in the .git/objects/info
directory or in the info directory of an alternate.
The commit graph file stores the commit graph structure along with some
extra metadata to speed up graph walks.
By listing commit OIDs in lexicographic order, we can identify an integer position for each commit and refer to the parents of a commit using those integer positions.
We use binary search to find initial commits and then use the integer positions
for fast lookups during the walk.
You can see the test use cases:
git log --oneline $BRANCH
git log --topo-order $BRANCH
git log --graph $COMPARE..$BRANCH
git branch -vv
git merge-base -a $BRANCH $COMPARE
This will improve git log
performance.
With Git 2.39 (Q4 2022), the glossary entries for "commit-graph file" and "reachability bitmap" have been added.
See commit 8fea12a, commit 4973726, commit fa8e8d5, commit 776ba91 (29 Oct 2022) by Philip Oakley (PhilipOakley
).
(Merged by Taylor Blau -- ttaylorr
-- in commit 4b6302c, 08 Nov 2022)
glossary
: add reachability bitmap description
Signed-off-by: Philip Oakley
Signed-off-by: Taylor Blau
Describe the purpose of the reachability bitmap.
glossary-content
now includes in its man page:
reachability bitmaps
Reachability bitmaps store information about the
reachability of a selected set of commits in
a packfile, or a multi-pack index (MIDX), to speed up object search.
The bitmaps are stored in a ".bitmap
" file.
A repository may have at
most one bitmap file in use.
The bitmap file may belong to either one
pack, or the repository's multi-pack index (if it exists).
And:
glossary
: add "commit graph" description
Signed-off-by: Philip Oakley
Signed-off-by: Taylor Blau
Git has an additional "commit graph" capability that supplements the normal commit object's directed acyclic graph (DAG).
The supplemental commit graph file is designed for speed of access.
Describe the commit graph both from the normative DAG view point and from the commit graph file perspective.
Also, clarify the link between the branch ref and branch tip by linking to the ref
glossary entry, matching this commit graph entry.
The commit-graph file is also distinguished by its hyphenation.
Subsequent commit catches the few cases where the hyphenation of commit-graph was missing.
glossary-content
now includes in its man page:
commit graph concept, representations and usage
A synonym for the DAG structure formed by the commits
in the object database, referenced by branch tips,
using their chain of linked commits.
This structure is the definitive commit graph. The
graph can be represented in other ways, e.g. the
"commit-graph" file.
commit-graph file
The "commit-graph" (normally hyphenated) file is a supplemental
representation of the commit graph
which accelerates commit graph walks.
The "commit-graph" file is
stored either in the .git/objects/info
directory or in the info
directory of an alternate object database.
Git 2.19 (Q3 2018) will take care of the lock file:
See commit 33286dc (10 May 2018), commit 1472978, commit 7adf526, commit 04bc8d1, commit d7c1ec3, commit f9b8908, commit 819807b, commit e2838d8, commit 3afc679, commit 3258c66 (01 May 2018), and commit 83073cc, commit 8fb572a (25 Apr 2018) by Derrick Stolee (derrickstolee
).
Helped-by: Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit a856e7d, 25 Jun 2018)
commit-graph
: fix UX issue when .lock
file exists
We use the lockfile API to avoid multiple Git processes from writing to
the commit-graph file in the .git/objects/info
directory.
In some cases, this directory may not exist, so we check for its existence.
The existing code does the following when acquiring the lock:
- Try to acquire the lock.
- If it fails, try to create the
.git/object/info
directory.
- Try to acquire the lock, failing if necessary.
The problem is that if the lockfile exists, then the mkdir fails, giving
an error that doesn't help the user:
"fatal: cannot mkdir .git/objects/info: File exists"
While technically this honors the lockfile, it does not help the user.
Instead, do the following:
- Check for existence of
.git/objects/info
; create if necessary.
- Try to acquire the lock, failing if necessary.
The new output looks like:
fatal: Unable to create
'<dir>/.git/objects/info/commit-graph.lock': File exists.
Another git process seems to be running in this repository, e.g.
an editor opened by 'git commit'.
Please make sure all processes are terminated then try again.
If it still fails, a git process may have crashed in this repository earlier:
remove the file manually to continue.
Note: The commit-graph facility did not work when in-core objects that
are promoted from unknown type to commit (e.g. a commit that is
accessed via a tag that refers to it) were involved, which has been
corrected with Git 2.21 (Feb. 2019)
See commit 4468d44 (27 Jan 2019) by SZEDER Gábor (szeder
).
(Merged by Junio C Hamano -- gitster
-- in commit 2ed3de4, 05 Feb 2019)
That algorithm is being refactored in Git 2.23 (Q3 2019).
See commit 238def5, commit f998d54, commit 014e344, commit b2c8306, commit 4c9efe8, commit ef5b83f, commit c9905be, commit 10bd0be, commit 5af8039, commit e103f72 (12 Jun 2019), and commit c794405 (09 May 2019) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit e116894, 09 Jul 2019)
Commit 10bd0be explain the change of scope.
With Git 2.24 (Q3 2109), the code to write commit-graph
over given commit object names has been made a bit more robust.
See commit 7c5c9b9, commit 39d8831, commit 9916073 (05 Aug 2019) by SZEDER Gábor (szeder
).
(Merged by Junio C Hamano -- gitster
-- in commit 6ba06b5, 22 Aug 2019)
And, still with Git 2.24 (Q4 2019), the code to parse and use the commit-graph file has been made more robust against corrupted input.
See commit 806278d, commit 16749b8, commit 23424ea (05 Sep 2019) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 80693e3, 07 Oct 2019)
t/t5318
: introduce failing 'git commit-graph write' tests
When invoking 'git commit-graph' in a corrupt repository, one can cause a segfault when ancestral commits are corrupt in one way or another.
This is due to two function calls in the 'commit-graph.c
' code that may
return NULL
, but are not checked for NULL-ness before dereferencing.
Hence:
commit-graph.c
: handle commit parsing errors
To write a commit graph chunk, 'write_graph_chunk_data()
' takes a list of commits to write and parses each one before writing the necessary data, and continuing on to the next commit in the list.
Since the majority of these commits are not parsed ahead of time (an exception is made for the last commit in the list, which is parsed early within 'copy_oids_to_commits
'), it is possible that calling 'parse_commit_no_graph()
' on them may return an error.
Failing to catch these errors before de-referencing later calls can result in a undefined memory access and a SIGSEGV.
²
One such example of this is 'get_commit_tree_oid()
', which expects a parsed object as its input (in this case, the commit-graph
code passes '*list
').
If '*list
' causes a parse error, the subsequent call will fail.
Prevent such an issue by checking the return value of 'parse_commit_no_graph()' to avoid passing an unparsed object to a function which expects a parsed object, thus preventing a segfault.
With Git 2.26 (Q1 2020), the code to compute the commit-graph has been taught to use a more robust way to tell if two object directories refer to the same thing.
See commit a7df60c, commit ad2dd5b, commit 13c2499 (03 Feb 2020), commit 0bd52e2 (04 Feb 2020), and commit 1793280 (30 Jan 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit 53c3be2, 14 Feb 2020)
commit-graph.h
: store an odb in 'struct write_commit_graph_context
'
Signed-off-by: Taylor Blau
There are lots of places in commit-graph.h
where a function either has (or almost has) a full struct
object_directory *, accesses
->path`, and then throws away the rest of the struct.
This can cause headaches when comparing the locations of object directories across alternates (e.g., in the case of deciding if two commit-graph layers can be merged).
These paths are normalized with normalize_path_copy()
which mitigates some comparison issues, but not all 1.
Replace usage of char *object_dir
with odb->path
by storing a struct object_directory*
in the write_commit_graph_context
structure.
This is an intermediate step towards getting rid of all path normalization in 'commit-graph.c
'.
Resolving a user-provided '--object-dir
' argument now requires that we compare it to the known alternates for equality.
Prior to this patch, an unknown '--object-dir
' argument would silently exit with status zero.
This can clearly lead to unintended behavior, such as verifying commit-graphs that aren't in a repository's own object store (or one of its alternates), or causing a typo to mask a legitimate commit-graph verification failure.
Make this error non-silent by 'die()
'-ing when the given '--object-dir
' does not match any known alternate object store.
With Git 2.28 (Q3 2020), the commit-graph write --stdin-commits
is optmized.
See commit 2f00c35, commit 1f1304d, commit 0ec2d0f, commit 5b6653e, commit 630cd51, commit d335ce8 (13 May 2020), commit fa8953c (18 May 2020), and commit 1fe1084 (05 May 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit dc57a9b, 09 Jun 2020)
commit-graph
: drop COMMIT_GRAPH_WRITE_CHECK_OIDS
flag
Helped-by: Jeff King
Signed-off-by: Taylor Blau
Since 7c5c9b9c57 ("commit-graph
: error out on invalid commit oids in 'write --stdin-commits
'", 2019-08-05, Git v2.24.0-rc0 -- merge listed in batch #1), the commit-graph builtin dies on receiving non-commit OIDs as input to '--stdin-commits
'.
This behavior can be cumbersome to work around in, say, the case of piping 'git for-each-ref
' to 'git commit-graph write --stdin-commits
' if the caller does not want to cull out non-commits themselves. In this situation, it would be ideal if 'git commit-graph
write' wrote the graph containing the inputs that did pertain to commits, and silently ignored the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids
' which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't want to pass '--no-check-oids
', suggesting that we should get rid of the behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are indeed an error after loosening the error handling, perform an extra lookup to make sure that object indeed exists before sending it to the commit-graph internals.
This is tested with Git 2.28 (Q3 2020).
See commit 94fbd91 (01 Jun 2020), and commit 6334c5f (03 Jun 2020) by Taylor Blau (ttaylorr
).
(Merged by Junio C Hamano -- gitster
-- in commit abacefe, 18 Jun 2020)
t5318
: test that '--stdin-commits
' respects '--[no-]progress
'
Signed-off-by: Taylor Blau
Acked-by: Derrick Stolee
The following lines were not covered in a recent line-coverage test against Git:
builtin/commit-graph.c
5b6653e5 244) progress = start_delayed_progress(
5b6653e5 268) stop_progress(&progress);
These statements are executed when both '--stdin-commits
' and '--progress
' are passed. Introduce a trio of tests that exercise various combinations of these options to ensure that these lines are covered.
More importantly, this is exercising a (somewhat) previously-ignored feature of '--stdin-commits
', which is that it respects '--progress
'.
Prior to 5b6653e523 ("[
builtin/commit-graph.c](https
://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/builtin/commit-graph.c): dereference tags in builtin", 2020-05-13, Git v2.28.0 -- merge listed in batch #2), dereferencing input from '--stdin-commits
' was done inside of commit-graph.c
.
Now that an additional progress meter may be generated from outside of commit-graph.c
, add a corresponding test to make sure that it also respects '--[no]-progress
'.
The other location that generates progress meter output (from d335ce8f24 ("[
commit-graph.c](https
://github.com/git/git/blob/94fbd9149a2d59b0dca18448ef9d3e0607a7a19d/commit-graph.c): show progress of finding reachable commits", 2020-05-13, Git v2.28.0 -- merge listed in batch #2)) is already covered by any test that passes '--reachable
'.
With Git 2.29 (Q4 2020), in_merge_bases_many(), a way to see if a commit is reachable from any commit in a set of commits, was totally broken when the commit-graph feature was in use, which has been corrected.
See commit 8791bf1 (02 Oct 2020) by Derrick Stolee (derrickstolee
).
(Merged by Junio C Hamano -- gitster
-- in commit c01b041, 05 Oct 2020)
commit-reach
: fix in_merge_bases_many
bug
Reported-by: Srinidhi Kaushik
Helped-by: Johannes Schindelin
Signed-off-by: Derrick Stolee
Way back in f9b8908b ("[
commit.c](https
://github.com/git/git/blob/8791bf18414a37205127e184c04cad53a43aeff1/commit.c): use generation numbers for in_merge_bases()
", 2018-05-01, Git v2.19.0-rc0 -- merge listed in batch #1), a heuristic was used to short-circuit the in_merge_bases()
walk.
This works just fine as long as the caller is checking only two commits, but when there are multiple, there is a possibility that this heuristic is very wrong.
Some code moves since then has changed this method to repo_in_merge_bases_many()
inside commit-reach.c
. The heuristic computes the minimum generation number of the "reference" list, then compares this number to the generation number of the "commit".
In a recent topic, a test was added that used in_merge_bases_many()
to test if a commit was reachable from a number of commits pulled from a reflog. However, this highlighted the problem: if any of the reference commits have a smaller generation number than the given commit, then the walk is skipped _even
if there exist some with higher generation number_.
This heuristic is wrong! It must check the MAXIMUM generation number of the reference commits, not the MINIMUM.
The fix itself is to swap min_generation
with a max_generation
in repo_in_merge_bases_many()
.
Before Git 2.32 hopefully (Q1 2021), when certain features (e.g. grafts) used in the repository are incompatible with the use of the commit-graph, we used to silently turned commit-graph off; we now tell the user what we are doing.
See commit c85eec7 (11 Feb 2021) by Johannes Schindelin (dscho
).
(Merged by Junio C Hamano -- gitster
-- in commit 726b11d, 17 Feb 2021)
That will show what was intended for Git 2.31, but it has been reverted, as it is a bit overzealous in its current form.
commit-graph
: when incompatible with graphs, indicate why
Signed-off-by: Johannes Schindelin
Acked-by: Derrick Stolee
When gc.writeCommitGraph = true
, it is possible that the commit-graph is still not written: replace objects, grafts and shallow repositories are incompatible with the commit-graph feature.
Under such circumstances, we need to indicate to the user why the commit-graph was not written instead of staying silent about it.
The warnings will be:
repository contains replace objects; skipping commit-graph
repository contains (deprecated) grafts; skipping commit-graph
repository is shallow; skipping commit-graph