Between phd's comments about setting modes, and eftshift0's answer, you have some practical approaches for dealing with this already. Here's the theory that backs up these practical answers.
For some reason git needs an objects folder in the repository on its ssh server where users push or pull. For some reason (?) git creates folders below objects with random names in the range of 01 to ff.
It's actually 00
through ff
. There's something a bit odd going on in your case; we'll explore this in a bit.
The first thing to realize is that Git doesn't store files. What Git stores, in its main database—we'll come back to this main word later—is objects. These objects have hash IDs: names of the form faefdd61ec7c7f6f3c8c9907891465ac9a2a1475
, for instance. The hash IDs you commonly see—though often abbreviated as, e.g., faefdd61e
for instance—are those of commit objects, but there are in fact four object types. The first one is of course the commit; the remaining three are tree, blob, and annotated tag.
File contents go into the blob objects. File names get divided into name components, in the familiar directory-and-file-name style from Unix/Linux systems, by slashes; these name components, plus additional information as needed, go into tree objects; and a commit object then refers to a tree object to hold the data—the files—for the commit, in Git's compressed and de-duplicated object-store form. Annotated tag objects exist so that annotated tags can store data as well as a commit hash ID (or any other object hash ID, though it's unusual to have an annotated tag object that points to anything other than a commit object).
Hence, the main database of any Git repository is this object database. Objects themselves can be stored either as loose objects or their opposite: packed objects (not tight objects, although the packing does pack them pretty tightly ). Packed objects are stored in a pack file, and the pack files live in the objects
directory under a subdirectory named pack
. Your .git/objects/pack
should contain one or more *.pack
files, each of which also has a corresponding *.idx
file. We'll come back to pack files in a bit.
Loose objects are stored with each object in a stand-alone file-system-level file. The object's name might be dd1cf41e007a0036e18eef4b0acae505ec52f168
. If this is to be stored as a loose object, rather than a packed one, its file system level name will be dd/1cf41e007a0036e18eef4b0acae505ec52f168
. We simply take the first two characters of the hexadecimal expansion of the hash ID off the front and use them as a directory name, and use the remaining characters as the file name.
The choice of two characters here has to do with the expected "fluffiness" and "fullness" of the loose-objects directories, and the performance (or lack thereof) of the original Linux file systems when using directories with a lot of files in them. If all loose objects were dropped into a single directory, that directory would accumulate about two to six thousand files before Git would "pack" the objects. The choice of how many loose files to leave is complicated and included at least a little guesswork, plus file activity patterns from the early 2000s, so these numbers don't necessarily all make sense today, but that's what Linus Torvalds did at the time, and it remains in place because it works well.1
When users run git push
(but not git pull
), their Git calls up some other Git. Their Git reads their Git repository. The server Git reads, and writes, the server's repository. Their Git figures out which commit objects they have, that the server lacks, and sends over these commit objects. The two Gits coordinate and the sending Git can also figure out what other objects are required.
Once the sender has the list of all objects that are required, it will normally gather all of these objects and write out what Git calls a thin pack. A thin pack is a pack that violates one of the normal constraints of a pack file, so now it's time to describe what a pack file is for.
Pack files use delta compression to reduce the need for disk space, and the delta-compression works best when the packs are generated with a batch of files at a time. (This also feeds into the calculation of when to turn a collection of loose objects into a pack.) Note that loose objects are merely zlib-compressed, not delta-compressed, so at the object level Git does not use delta-compression. This also means a pack file is often considerably smaller than the set of loose objects that it contains.
For a simple example, suppose the very first commit in a repository has a fairly big file (a few dozen megabytes or whatever: for concreteness, say it's 10 MiB). Subsequent commits either add a little bit to the file, or take a little bit away from it. Git must initially store the new commits with a new loose object that is also about 10 MiB, to store the slightly-different content. So each commit that modifies this big file adds 10 MiB to the repository.
Once we can pack the objects, though, we can pick one of these objects—probably the most recent copy of the file, as it's the one we are most likely to check out—and store that one in full, and then store other versions of the file as instruction sequences: start with the big file, then remove 140 bytes at the end for instance.2 The deltas can use multiple objects via multiple instruction sequences, and can refer to objects that are themselves stored using delta instructions, as long as the graph of objects used in these constructions is not circular. The end result, of course, is that if we have 50 copies of the 10 MiB file, each slightly different, the pack file holds just the 10 MiB file plus about 49 short modifiers.
The objects used to construct the final objects are called delta bases. As we just noted, a delta-compressed object can itself be a delta-base. A chain of deltas is called a delta chain and decompressing such an object involves a bit of recursion. As long as the pack file is well-formed, the recursion is never infinite, so that's fine; and we can use techniques like memoization to make this go reasonably fast, if needed.
In any case, the normal constraint on a pack file is that it should contain every object that is needed to reconstruct the final object. A thin pack is one where we allow the sending Git to assume that the receiving Git already has some objects, and use those objects as delta bases without including them in the pack. So a thin pack can be very small indeed: it's ideal for transmission across a network connection.3
The result is that git push
normally sends a thin pack. The receiving Git should take this thin pack and "fix it" to make a regular pack. No loose objects are created during this process. The fact that you're getting loose objects indicates that your pushes are not using thin packs. This isn't wrong, but you might investigate why this is the case.
1These files are all written once, then never touched again, except to be removed after being packed into a pack file. (They don't have to be removed, but that's the normal action.) You can also explode a pack file into individual objects.
Note that all Git objects are completely read-only, because their hash ID names are constructed by hashing the contents of the object file. Each file begins with a header giving the object's type—one of the four object types—and size, and the type-and-size-bytes are included in the hash ID, which fortuitously protected Git from the original SHAttered attack (see How does the newly found SHA-1 collision affect Git?). Still, the hash algorithm will eventually be upgraded to a more resistant one. This transition will be an interesting time, in the same sense that 2020 has been an interesting year.
2The actual encoding is, I think, composed of just two instructions: "take n bytes from offset o of object obj" and "insert literal byte sequence S", but one can imagine any kind of instructions here. They're all more or less equivalent. One can add extra instructions, such as "copy n bytes from offset once" vs "copy n bytes from offset, repeating r times", or require the copy operation to specify the number of copies to make, or whatever, but these are all just small tweaks. A richer instruction set generally offers more compression opportunities, at the cost of more-complex code to find a minimal compression, and a larger encoded-instruction format.
3The operating assumption here is cpu is cheap, network bandwidth is expensive.
Finishing up
We begin with a git push
. This sends objects, usually as a thin pack. The receiving Git should store these objects, or this thin pack, somewhere: modern Gits use a quarantine area, and old Gits just dump them right into the object database.
Having sent the objects, the sending Git now sends a sequence of name updates. These affect the name database, which is the other primary database in a Git repository. The names stored in this database are branch names, tag names, remote-tracking names, and any other names that Git finds useful. A push normally sends one or more branch and/or tag name update requests.
The receiving Git is allowed to inspect and verify these requests, using the objects that were received (and maybe quarantined) to vet everything. If the vetting passes—if there is no vetting, it just automatically passes—the receiving Git then inspects the name updates. Branch name updates must either be forced, with the --force
or +
flag in the git push
command, or else be fast-forward operations or new names.
A fast-forward operation leaves the name in a position such that, by following the commit graph backwards, the commit identified by the previous position is reachable from the new position. In other words, the receiving Git might get a request to update branch name br1
. The new commit identified by the updated name must be a descendant of the commit currently found via the name br1
.
If all is OK with the name update, and all is permitted via the pre-receive and update hooks (if any) that did the vetting (if any), the receiving Git accepts the update and fixes the thin pack, or otherwise moves the objects out of quarantine. This is when you'd get new .git/objects/
directories created, if necessary.
The Git that's doing the receiving creates these directories with mkdir
system calls. These use both the umask
of the Git process doing the mkdir
, and the permissions supplied to the mkdir
call. The ownership of the new directories is set by the OS's rules: the group owner might be the group ID of the process, or it might be the group ID of the parent directory. Using the set-group-ID trick is a fairly standard way on Unix and Linux systems to tell the OS to set the group-ID of the new directory based on the group-ID of the containing directory.
If your Git is using pack files—as it generally should be—the main issue would be making sure that the .git/objects/pack
directory and its contents have the right ownership and permissions. If your Git is using loose objects, figure out why, as well as looking into making sure that new directories here have the right ownership and permissions. These are all controlled by your OS; Git's role here is merely to set its umask
and pass the right arguments to open
and mkdir
system calls.