This could be because of a too large pack file.
Before Git 2.30 (Q1 2021), the code was not prepared to deal with pack .idx
file that is larger than 4GB.
See commit 81c4c5c, commit 9bb4542, commit 33bbc59, commit a9bc372, commit f86f769 (13 Nov 2020) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit fcf26ef, 25 Nov 2020)
pack
:use size_t
to store pack .idx
byte offsets
Signed-off-by: Jeff King
We sometimes store the offset into a pack .idx
file as an "unsigned long", but the mmap'd size of a pack .idx
file can exceed 4GB.
This is sufficient on LP64 systems like Linux, but will be too small on LLP64 systems like Windows, where "unsigned long" is still only 32 bits.
Let's use size_t,
which is a better type for an offset into a memory buffer.
That impacts git fsck
fsck
: correctly compute checksums on idx files larger than 4GB
Signed-off-by: Jeff King
When checking the trailing checksum hash of a .idx
file, we pass the whole buffer (minus the trailing hash) into a single call to the_hash_algo->update_fn()
.
But we cast it to an "unsigned int".
This comes from c4001d92be ("Use off_t
when we really mean a file offset.", 2007-03-06, Git v1.5.1-rc1 -- merge). That commit started storing the index_size
variable as an off_t,
but our mozilla-sha1 implementation from the time was limited to a smaller size.
Presumably the cast was a way of annotating that we expected .idx
files to be small, and so we didn't need to loop (as we do for arbitrarily-large .pack files). Though as an aside it was still wrong, because the mozilla function actually took a signed int.
These days our hash-update functions are defined to take a size_t,
so we can pass the whole buffer in directly. The cast is actually causing a buggy truncation!
While we're here, though, let's drop the confusing off_t
variable in the first place. We're getting the size not from the filesystem anyway, but from p->index_size,
which is a size_t
. In fact, we can make the code a bit more readable by dropping our local variable duplicating p->index_size,
and instead have one that stores the size of the actual index data, minus the trailing hash.'
(Copied to the clipboard)