18

Let's say I want to write a small helper that allows to append some metadata to a repository in a way that can propagate to clones via refs. Simple example (a clone prototype that doesn't even attach the notes to any other git object):

hash=$(echo "Just a comment" | git hash-object -w --stdin)
git update-ref refs/comments/just $hash

i.e. I create a blob with hash hash and refer to that as refs/comments/just so git fsck --unreachable won't complain about it and git gc will never prune the object.

But that is of course a very simple example, in reality I'm interested in more complex features. And there, my question is, what can I "legally" do and what should I absolutely refrain from?

As an example, several posts on SE were about users having to recover from duplicate tree entries. So one "don't" is therefore "don't create a tree with duplicate entries". Another example is "do make sure your objects are reachable, so git prune won't remove them". What else?

Can I create a custom object type? Use "invalid" filemodes for blobs in trees? Where can I find an overview? Or should I check git-fsck's source manually to see what constitutes errors (and which ones are ignore-able)?

showtime
  • 3
  • 2
Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • 3
    as an aside, rather than echo to a file, see `git update-ref`. – jthill Feb 09 '17 at 16:19
  • 1
    I see someone has voted to close this as "too broad", but I think you've pretty much answered it yourself: consult the `git fsck` source. :-) Note that it currently allows a few "historic mistakes", such as mode `664`, and that it has grown additional tests over time, so you may wish to be very careful if you're doing something that "feels a bit off" but that git-fsck doesn't hate. As @jthill noted, use `git update-ref` to manipulate special refs. – torek Feb 09 '17 at 18:42
  • @jthill I _knew_ there must be a command for that, thanks :) – Tobias Kienzler Feb 09 '17 at 19:45
  • @torek The trouble with consulting the source means I can't be certain this won't change in the future. Maybe there is a more API-ish document about this? I'm certainly not the first one wanting to do this, right? From what I read e.g. [tag:gerrit] can also use refs for review, though I guess that really just sticks to blobs... – Tobias Kienzler Feb 09 '17 at 19:47
  • 2
    @TobiasKienzler there are some API docs, but nothing quite on that level. You definitely can't add new object types (Git wouldn't know what to do with them—if they're "leaf" types like blobs, one could argue that Git should not interpret them, but then you might as well just use "blob" anyway) and fsck will limit what you can do with trees (plus they're darn difficult to manipulate) so in the end you're back to `git notes` equivalents. – torek Feb 09 '17 at 20:13
  • @torek I was afraid you'd say that... But you're right, it's hard to imagine anything but blobs and trees necessary (and of course there's also the commit type, which is kind of a super-tree with "anonymous" parents, a blob-ish message and _one_ tree linked). I sometimes have an idea and think "that would require some other object type" but after a while it actually turns out the existing type would do just fine... – Tobias Kienzler Feb 10 '17 at 08:29
  • 1
    Since I see you're still here, what specific difference is there between "notes that doesn't even attach the notes to any other git object" and just an ordinary sideband branch (with its own root)? Make your "mynotes" branch, check it out into a `git worktree add`ed tree wherever you want, and have your way with it. – jthill May 19 '23 at 00:26
  • @jthil tbh I don't remember my exact idea since it's been six years ago, but your suggestion sounds like it makes sense, thanks! – Tobias Kienzler May 20 '23 at 08:22

1 Answers1

3

dos and don'ts of custom objects and refs?

Dos:

  • Backup Your Repo: Before making significant alterations to a repository's internal structure, create a backup. I recommended before using git bundle create /tmp/foo-all --all.

  • Use a Distinct Namespace: If you're introducing custom refs, try to use a distinct namespace (like refs/comments/ in your example) to avoid any collisions with Git's conventional ref names.

  • Ensure Object Reachability: your custom objects should always be reachable from some ref, to avoid accidental pruning by git gc or the more recent git maintenance.

  • Test in a Separate Repo: Before applying your customizations to a primary or production repository, test in a separate or cloned repository to confirm your assumptions and ensure that there are no unexpected consequences.

  • Adhere to Object Types: Stick with the four primary object types (blob, tree, commit, and tag) for maximum compatibility. If you're trying to store custom data, it usually makes sense to store it as a blob and then reference it from a tag or commit.

Don'ts:

  • Avoid Duplicate Tree Entries: As you noted, tree objects should not contain duplicate entries. This can lead to unexpected behavior.

  • Don't Use Invalid Filemodes: While it might be tempting to use custom file modes for blobs in trees, it's likely to cause problems. Stick with the recognized modes detailed here (040000 for subdirectory (tree), 100644 for file (blob), 100755 for executable, and 120000 for a symbolic link).

  • Avoid Creating Custom Object Types: Git recognizes four primary object types (blob, tree, commit, and tag). Introducing custom object types would likely break Git's internal mechanisms and tools that expect only these four.

  • Don't Modify Existing Objects: The integrity of Git relies on the immutability of objects. Once an object is created, it should never be changed. If changes are needed, create a new object and update the references accordingly.

  • Avoid Inconsistencies with SHA-1: The SHA-1 hash is integral for object identification and verification in Git. Any custom operation that might produce an inconsistency between the content of an object and its hash is a big no-no.


jthill asked in the comments:

what specific difference is there between "notes that doesn't even attach the notes to any other git object" and just an ordinary sideband branch (with its own root)?

  • Git notes are a way to append arbitrary metadata to objects without modifying the objects themselves. Typically, this means adding notes to commits. Notes are stored in their own refs, typically under refs/notes/, but they "attach" to another object (like a commit) by referring to that object's hash.

  • A sideband branch is just a regular branch, but perhaps used for a purpose different from the primary branches. It has its own commit history and tree. It's stored under refs/heads/ just like any other branch.

Hence, jthill's recommendation: create a "mynotes" branch and use git worktree (that I presented here).
That would make a separate workspace for this metadata, completely isolated from your main work.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250