Is it possible to reference Git objects from a commit without adding them to the tree-object filesystem?

Question

I can add a file manually to the git object store using for example

echo "foo bar" | git hash-object -w --stdin

This will create a file in .git/objects with the content foo bar (plus some header) and name it by its SHA-1 (or SHA-256 depending on --object-type) hash.

However, that's a dangling object now. Next time git gc runs and my object is older than the set prune time (2 weeks I think by default), that file will be deleted. It also won't get pushed to the remote repository because it's not reachable yet by any pushable thing (branch or tag).

To reference it, I could either add it to a tree object that's referenced by a commit object or reference it in a tag.

However, adding it to a tree will add it to the file system represented by it (and it must be given a filename) and referencing it with a tag will not make it part of the merkle-tree of a commit (and it won't get pushed/pulled automatically hence).

I would like to add an object to the object store and have it be part of the merkle-tree of my commit (so that it gets pushed/pulled and is protected from garbage collection) but without giving it a filename (i.e. without it ever appearing in the staging area outside of the .git folder when checking out a commit that references it).

Is this possible somehow with vanilla Git? Is it for example possible to add entries to a tree object without specifying a file-name for it? If so, how? (git write-tree takes the files from the staging area so they need to have filenames.)

Or is it maybe possible to reference the object hash from the commit message in a way that tells Git that this object is part of the commit and thus must be pushed/pulled with the commit?

Or can git update-index be used for this?

What is the use case for this? Why do you need a blob without it ever being a file in a tree? — mkrieger1, Feb 05 '21 at 16:00
Even `git notes` doesn't get the behavior you're asking for, and it's a built-in command... so I'm going to go with "no, it's not possible". — Mark Adelsberger, Feb 05 '21 at 16:09
I want to store some repository-wide metadata used with hooks (in particular certificate-chains for long-term-validation of signatures (not pgp) contained in commit messages) and there’s 1. no reason for these files to be revisioned 2. It should not impose some folder structure onto projects using these hooks. — matthias_buehlmann, Feb 05 '21 at 16:13

score 2 · Answer 1 · answered Feb 05 '21 at 16:44

2

You can create a non-branch reference that does not share any history or name with the other branches and add the file there. Chances are you do need to track changes to the file at some point so this leaves you with the option to do that later.

For example:

git checkout --orphan temporary-branch
git reset --hard
git add the_file
git commit -m 'Add the file'
git update-ref refs/hook-metadata/foo temporary-branch
git branch -D temporary-branch

Reference:

answered Feb 05 '21 at 16:44

mkrieger1

19,194
5
54
65

but how do I make a commit in my actual branch then depend on that resource so yhat it gets pushed automatically eith the commit? – matthias_buehlmann Feb 05 '21 at 17:05
On second thought, I'm not sure if I understand your requirements. If the actual branch should depend on it, the resource needs to be contained in that branch. Maybe you can edit your question and show an example of how you intend the resource to be used. – mkrieger1 Feb 05 '21 at 19:07

score 1 · Answer 2 · answered Feb 06 '21 at 07:54

Is this possible somehow with vanilla Git? Is it for example possible to add entries to a tree object without specifying a file-name for it?

No. Tree entries are <mode, name, hash> tuples. The mode part must be one of Git's well-defined modes: 100644 for a non-executable file, 100755 for an executable file, 120000 for a symbolic link, 160000 for a gitlink. (Mode 040000 represents a tree object and can only appear in another tree object. As you noted, git write-tree turns the index into a tree, so normally you'd put these in the index ... which is what git add does. The index cannot store a mode 40000 entry. You can also use git mktree to make trees; this is how to add a subtree to a tree.)

The extraction code in Git comes in two parts, sort of: there's one part that reads a commit or tree into Git's index (git read-tree) and one part that reads Git's index and creates usable files from the various symlinks and blob objects in the index (git checkout-index). A checkout or switch operation that switches from the current commit to a new commit simply rolls both of these into a single convenient (and useful) command. Of course it turns out there is a ton of overlap, so that git read-tree can affect the working tree, and git checkout can read or fill in the index (though this functionality is now in git restore if you prefer to avoid lumping everything into git checkout by using the split-up switch-and-restore commands—this amounts to admitting well, it looked cleanly separable, but it turns out it wasn't...).

... I want to store some repository-wide metadata used with hooks (in particular certificate-chains for long-term-validation of signatures (not pgp) contained in commit messages) and there’s 1. no reason for these files to be revisioned 2. It should not impose some folder structure onto projects using these hooks.

The confounding issues here, per point, are these:

Git can't store files without versioning them. Its underlying storage model is "revisions".
Git can't really deal with files without also using the host OS to deal with files. You can get arbitrarily close, using git hash-object -w and git cat-file -p and the like, but in the end, you almost certainly have to store that file data in, well, a file. A host-OS file, that is. It's the host that imposes folder structures on files in the first place, so there's no getting around that anyway.

You can try to be flexible about the host-OS-enforced structuring in point 2 so that projects can do things their own way. You can simply store a single commit, under some known ref name, without any history (no parent for this commit, ever) in point 1, so that the versioned files "fall away" automatically. But you'll still be versioning files that came from some sort of directory-and-file-tree structure.

To your point 1.: i want to store data, not “files”. Git also stores commits, tags and trees as data objects without turning them into revisionable files. I’d like to do the same with some custom data and still have it referenced (and checked out/pulled) with the commit that uses that data - if that’s possible — matthias_buehlmann, Feb 06 '21 at 13:11
To your point 2. That’s exactly what i want, store the data with `git hash-object -w` so that i can retrieve the data again using `git cat-file -p hash` - but in addition I want to reference that hash in the commit such that git will consider that blob as not being dangling and push/pull it together with the commit that references it — matthias_buehlmann, Feb 06 '21 at 13:16
But in a commit, the only data is a tree; in a tree, the only data is a subtree or a blob, and a blob has a name. That name *becomes* a *file* name. And the OS won't store data either unless you give it a *file name*. — torek, Feb 06 '21 at 15:38

Is it possible to reference Git objects from a commit without adding them to the tree-object filesystem?

2 Answers2