1

I'm running git name-rev 'git rev-list master' on my master branch. From the list of output, I saw the following tag/version format:

<SHA-1 Hash> tag/version-0.8.7
<SHA-1 Hash> tag/version-0.8.7~1
<SHA-1 Hash> tag/version-0.8.7~2
<SHA-1 Hash> tag/version-0.8.7~2^0
<SHA-1 Hash> tag/version-0.7.6~10^2~2

I'm confused by the ~ and ^ even with the documentation(link below):

<rev>~[<n>], e.g. HEAD~, master~3 A suffix ~ to a revision parameter means the first parent of that commit object. A suffix ~<n> to a revision parameter means the commit object that is the <n>th generation ancestor of the named commit object, following only the first parents. I.e. <rev>~3 is equivalent to <rev>^^^ which is equivalent to <rev>^1^1^1.

So does tag/version-0.8.7~2 means two committed changes before tag/version-0.8.7? What does "the <n>th generation ancestor" mean?

<rev>^[<n>], e.g. HEAD^, v1.5.1^0 A suffix ^ to a revision parameter means the first parent of that commit object. ^<n> means the <n>th parent (i.e. <rev>^ is equivalent to <rev>^1). As a special rule, <rev>^0 means the commit itself and is used when <rev> is the object name of a tag object that refers to a commit object.

Does tag/version-0.8.7~2^0 means two committed changes itself (cause of the ^0)

Hope I can get a clarification on that. Also is there a way to not show these ~ and ^ when making a tag? Thank you

https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt-emltrevgtltngtemegemHEADv1510em

https://git-scm.com/docs/git-rev-parse#Documentation/git-rev-parse.txt-emltrevgtltngtemegemHEADmaster3em

torek
  • 448,244
  • 59
  • 642
  • 775
DavidKanes
  • 183
  • 1
  • 1
  • 10

1 Answers1

1

Warning: this answer is quite long, and there's no TL;DR at the top.

I'm not completely sure what you're asking here, especially in this part:

Also is there a way to not show these ~ and ^ when making a tag?

When you make a tag, you choose the name yourself. You cannot embed the ~ and ^ characters in the name you choose—see rule 4 in the check-ref-format documentation—so they will never be part of any tag.

The git name-rev command adds ~ and ^ characters to existing, valid names when borrowing them, but only when that's necessary. To understand what's going on here, you need to understand a little bit of graph theory, and that Git's commits form a Directed Acyclic Graph or DAG.

... does tag/version-0.8.7~2 means two committed changes before tag/version-0.8.7?

Sort of. It's a good idea to ditch the phrase "two committed changes", though. Git's commits are not changes. A commit represents a complete snapshot of all files.

Each commit, in a Git repository, is a real, actual thing: something solid, as it were, that you can take out and examine. Think of them as marbles in a bag, or bricks in a building, or whatever other analogy you like. If you use the "bricks in a building" analogy, then it's true that the building (the whole collection) is not a brick, and just having a brick won't get you the building—but at the same time, each brick is a real thing. It's not an abstract concept (though the concept itself certainly exists; you can imagine a "perfect brick", if you like, with no flaws or imperfections, but you'll never find an actual brick with no flaws: see Plato's Theory of Forms). If you take a real brick out of the building, it's something solid, and you can whack someone in the head with it. (Ow! Quit that! ) With that in mind, let's note that any given repository is largely made up of the commits that are in that repository, in much the same way that the Puni Distillery in Glorenza, Italy is made up of those specific bricks, in their specific arrangement. (I've never tried Puni whisky, but this looks like an interesting place to visit!) You can't just pile any old bricks (or commits) up randomly; you need these ones, in this specific arrangement.

Going back to commits in a Git repository, let's consider what makes a commit a commit. Each commit in the repository is unique, and each one has a unique number, that lets Git find that commit. So we know that every commit is numbered. These commit numbers are hash IDs, the big ugly strings that look like 3cf59784d42c4152a0b3de7bb7a75d0071e5f878, for instance. What you might not know—not all Git tutorials make this clear—is that every commit holds a full snapshot of every file.

To make this work well, Git stores the files in a special, read-only, compressed, Git-only, frozen-for-all-time, and de-duplicated format. So if you make a bunch of commits in a row, and most of those commits mostly re-use the same files as previous commits, most of those commits take almost no space because the files are literally re-used. Only files that are unique to this particular commit have to store a new copy. (Note that this technique kind of falls apart if you store, say, 5 GiB DVD images. Every slightly-different DVD image requires another 5 GiB. If you're storing text files, and you have tens of thousands of those files and change one, you're changing just one small file—something that's maybe a few kiB or MiB at most—and re-using all the others while storing one new 5 kiB or even 5 MiB file is easy, compared to storing one new 5 GiB file.)

Besides the files—which are usually the main data of a commit—each commit also stores some metadata, or information about the commit itself. This includes the name of the person who made the commit. It includes a date-and-time-stamp, showing when they made the commit. It includes their commit message. But it also includes one key item that Git wants for Git: Git stores, in the commit, the commit number—the hash ID—of some set of previous commits used to build up the repository-so-far. Most commits only need one previous-commit hash ID, and that's what we'll look at first.

Review: commits are snapshots with metadata

The thing to take away from the above is this:

  • Each commit is numbered, using a hash ID. This is literally how Git finds commits, by the numbers.

  • Each commit has two parts: it's a snapshot of all of the files Git knew about at the time you (or whoever) made the commit, plus some metadata about the commit, including who made it, plus information about where the commit goes in the history.

The history information, which is part of each commit, is simply a link—a pointer, if you will—to some set of previous commits.

These commits-and-links make up a Directed Graph

Mathematically, a graph is just a collection of vertices (sometimes called nodes) and edges: connections from one vertex to another. If the edges have a directional arrow, mathematicians and computer scientists tend to call them arcs instead of edges. We'll just start by drawing them using arrows.

In Git, the simplest kind of graph is a straight line. We have some series of commits, starting with the very first one we ever made in some repository, and ending with the most recent one. Each one has a unique—and random-looking—hash ID, but to keep ourselves sane, we'll draw these commits using sequential uppercase letters, like this:

A <-B <-C

Commit C is our most recent commit. It has an arrow coming out of it (an edge/arc) that points to commit B. Git stores this edge by storing the actual hash ID of commit B inside commit C; that's why our arrow comes out of C.

Commit B is just like C: it has an arrow coming out of it, pointing to earlier commit A.

Commit A is a little bit special, because it's our first commit ever. Being first, it can't point backwards to any earlier commit—so it just doesn't. That, it turns out, is how Git knows to stop going backwards.

For Git to use these three commits, we have to tell Git the real hash ID of the last commit, C. We could write this hash ID down on paper, or a whiteboard, or something. But that seems silly. We have a computer: why not have the computer store the hash ID in a file?

Branch names

This is where branch names come in. It's also where tag names come in, but tag names have a wrinkle or two that branch names don't, so let's start with branch names.

A branch name, in Git, just holds the actual hash ID of the last or latest commit that we want to say is "on the branch". In a way, that's kind of like a commit: a commit holds the hash ID of a previous commit, and a branch name holds the hash ID of a commit. So both of these things are said to point to a commit. We can draw that now, like this:

A <-B <-C   <--master

The name master will hold the real hash ID of the last commit C. So Git can use the name to find C. Then, having found C, Git can find the snapshot and the metadata that go with commit C. It can get at all the files; it can show us who made the commit; and it can use commit C to step back one hop, to commit B, because C points to B.

If we want to add a new commit to this collection, we start by having Git extract commit C somewhere. We need a work area, because the files inside C are frozen, and we can't even read them with most of the computer programs; only Git can read its own formats. So we have Git copy the files out to a work area. We then work on the usable copies of the file—the copies that aren't in Git—and eventually tell Git to make a new commit, i.e., to make a snapshot of the updated files. (There's a lot more to this than this simple description suggests, but that's the heart of how we make a new commit.)

The new commit requires metadata as well as the snapshot, of course. The metadata will use your name and email address as the author, and "now" as when you make the commit. Git will collect a commit message from you too. Then Git will write all this out as a new commit. The new commit will get a new, unique, random-looking hash ID, but we'll just call the new commit D here. New commit D will point back to commit C, because that's the one you used to make D, and then Git will write D's actual hash ID into the name master, because that's the branch name you used while making D:

A <-B <-C <-D   <--master

We're now back to the same setup as before, except that now, there's one more commit. The name master finds D; D finds C; C finds B; and B finds A.

Note that we call these backwards-pointing arrows, from commit to commit, the parent links. Commit C is the parent of commit D. There's one other thing to know about this: none of these can ever be changed, once made. It's not just the files inside each commit that are frozen for all time: all parts of every commit are frozen for all time.

Using more than one branch name

With just one master branch in a repository, we can't really get branch-y. To make branches that look branch-y, we need to have more than one branch name. As soon as we do this, we need to update our drawings a bit, so that we can tell which name we're actually using.

Side note: at this point I'm going to get lazy and start drawing the connections from commits, backwards to their parent, as lines, because it's hard to draw arrows that point left-and-up or left-and-down. Just keep in mind that the connections are always backwards. Git works backwards, because it must do so: a child commit knows who its parent is, but a parent never knows who its future children might be. After the child is born, it can become a parent, but it's also frozen, so it can't remember its future children.

We'll start with our four commits:

A--B--C--D   <-- master (HEAD)

We'll attach the special name HEAD to one of the branch names. Then, to create a new branch, we'll pick one of the existing four commits and make a new name that points to that commit. Let's pick D, because that's the easiest commit to find, because it's the one we're already using, because master picks commit D now:

A--B--C--D   <-- develop, master (HEAD)

We get this by running git branch develop. Since we didn't pick a particular commit, git branch makes the new name point to the same commit as the one we're using right now.

If we now run git checkout develop (or, since Git 2.23, git switch develop), Git moves from using the name master to using the name develop. Both names select commit D, so nothing else happens right now: the files we have out are those for commit D, and we aren't changing commits, so Git doesn't change any of the checked-out files. But now HEAD is attached to the name develop:

A--B--C--D   <-- develop (HEAD), master

If we make a new commit now, and call that new commit E, here's how we can draw that:

A--B--C--D   <-- master
          \
           E   <-- develop (HEAD)

The files we have out now, and the branch we're on, are those from / for commit E. The name develop selects this commit. E has D as its parent, so commits A-B-C-D are on both branches. But branch master ends at commit D.

If we now run git checkout master, Git replaces our working files with those from commit D, and moves HEAD to be attached to master instead of develop, like this:

A--B--C--D   <-- master (HEAD)
          \
           E   <-- develop

Note that files that we changed and committed in E are still there, in E, but now that Git has taken out the files from D into our working area, what we see are the files from D, rather than the files from E.1

We can now make more changes—presumably, different changes than the ones we made for E—and mark the updated files for Git and run git commit and get a new commit, which we'll call F, on branch master, like this:

           F   <-- master (HEAD)
          /
A--B--C--D
          \
           E   <-- develop

I drew this particular graph this particular way to emphasize the idea that commits A-B-C-D are (still) on both branches, while commits E and F are only on develop and master respectively. But we could draw this as:

A--B--C--D--F   <-- master (HEAD)
          \
           E   <-- develop

or:

           F   <-- master (HEAD)
          /
A--B--C--D--E   <-- develop

if we wanted. These drawings represent the same graph.

The graph itself just has to have the right set of vertices (commits) connected by the right set of arcs (child-to-parent arrows). We could draw the graph vertically, with newer commits towards the top, as Git does. We could draw it vertically with newer commits towards the bottom (as some people do sometimes). If you haven't tried this before, it's a good idea to run git log --all --decorate --oneline --graph and take a look at how Git tries to draw the graph. Seeing the graph helps a whole lot, in terms of understanding the ~ and ^ notation.


1There are some special cases where, if we haven't actually committed something, Git will carry the uncommitted files around for us. This doesn't always work! Sometimes Git will say no: I can't change commits because of uncommitted work. The rules here are complicated. See Checkout another branch when there are uncommitted changes on the current branch if you want to learn where these rules come from—but note that your best bet is usually just to avoid switching branches with uncommitted work!


Once we have branches, we can have merges

Suppose we have a commit graph that goes like this:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

The latest commit on branch br1 is commit J, and the latest commit on branch br2 is commit L. (We haven't picked one of these to check out yet.) Each branch has some commits that are unique to each branch, and then, back in the past—two commits back from the end of each branch, in fact—both branches have a common history. The history from J to I to H, and the history from L to K to H, meet at commit H.

Using this common starting (ending?) point, Git can use the snapshot in commit H to figure out what work someone did on br1, to arrive at the snapshot in commit J. In the same way, Git can figure out what work someone else did on br2, to arrive at the snapshot in commit L. We can then ask Git to combine these two sets of work. By applying the combined work to the snapshot in commit H, we'll keep all the changes that led to J, and add all the changes that led to L. Or, we'll keep all the changes that led to L, and add all the changes that led to J. Either way, we'll get the same final result.2 So we'll run:

git checkout br1 && git merge br2

or:

git checkout br2 && git merge br1

and pick one side as "ours" and the other side as "theirs" and have Git do the merge. If all goes well—if there are none of the conflicts mentioned in the footnote here—Git will make a new merge commit on its own; we'll call this commit M:

          I--J
         /    \
...--G--H      M
         \    /
          K--L

The special thing about commit M is that it points back, not just to J, and not just to L, but rather to both J and L. This is how Git knows the history, and that commit M is a merge. Other than these two backwards links, commit M is otherwise just an ordinary snapshot, like any other commit.

Of course, Git now has to update one (and only one) of the two branch names so that it points to new merge commit M. The name to update is, of course, whichever name we picked to have HEAD attached to. So if we ran git checkout br1 and then git merge br2, we get:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

2Well, we'll get the same final result unless something done on one side of the merge—H-to-Jconflicts with someone done on the other, H-to-L, side. In this case, someone has to figure out how to resolve the conflict. If they use a flag like -X ours or -X theirs to prefer one side or the other, it will start to matter which "side" they are standing on, when they do the merge.


Merges have first and second parents

Remember, the snapshot in M will be the same, regardless of which branch name we have Git update here. But Git will also remember which branch name we used to do the merge by making sure that the two backwards links are stored in the right order.

Given that we were on br1 when we started, and that br1 now points to M, the first parent of commit M will be commit J:

          I--J
         /    \₁
...--G--H      M   <-- br1 (HEAD)
         \    /²
          K--L   <-- br2

If we switch the merge around—if we git checkout br2 and then run git merge br1—the order of the two parent links will go the other way around. Of course, the name that gets updated changes too:

          I--J   <-- br1
         /    \₂
...--G--H      M   <-- br2 (HEAD)
         \    /¹
          K--L

Note how, if you follow the first parent link backwards, you wind up at the commit that the name found just before the merge. In the first example, where the updated name is br1, one step back along the first-parent link gets you to commit J, which is the commit we had out when we ran git merge. In the second example, where the updated name is br2, one step back along the first-parent link gets you to commit L: again, the one we had out when we ran git merge.

Git supports three-or-more-parent merges

While we won't go into any detail here, Git can actually have a merge commit that has three or even more parents. Git calls these things octopus merges. They don't do anything you couldn't do by doing repeated two-parent merges, and in fact, there are things you can do with two-parent merges that Git won't do with three-or-more-parent merges. So they're not strictly necessary. Most parts of Git also don't distinguish anything other than the first parent: git log has a --first-parent flag that means ignore the other side of each merge, and if you use this on an octopus merge, it ignores all but the same-branch-name "side" of the multi-way merge.

Still, this does factor into the ^-suffix notation. It's not sufficient to just say "first parent" and "second parent", as there could be "third parent", "fourth parent", and so on. In the Linux Git repository, there's one crazy merge with 66 parents. Linus Torvalds once called it a Cthulhu Merge.

Tags

Tag names, in Git, are very much like branch names: a tag name holds the hash ID of one specific commit. But there's a mechanism issue here, and rather than hiding the mechanism to make everything Just Work the way you might want it to, Git exposes the mechanism, and makes you—the programmer—deal with it.

Specifically, tags come in two varieties: there is a lightweight tag, which normally points straight to a commit,3 and there is an annotated tag, which—by definition—points to an annotated tag object. This annotated tag object has its own hash ID, much like a commit has a hash ID. This allows Git to store some extra information in the tag, inside this annotated-tag-object thing. Then the annotated tag object points to the commit (though see footnote 3 again).

What this all ends up meaning is that for annotated tags, if we want the commit that the tag is supposed to mean, we have to tell Git: Don't just tell me about the tag. Tell me about what the tag tags! The general syntax for this is tag-name^{}, i.e., the tag name, followed by the hat or caret character, followed empty braces. This means follow the tag to its destination (aka peel the tag as in footnote 3). Without this suffix, if we have a tag like v1.2 that is an annotated tag, and we run:

git rev-parse v1.2

we get the hash ID of the annotated tag object, rather than what the v1.2 tag tags. Here's an example straight out of the Git repository for Git itself:

$ git rev-parse v1.2.0
041ed88c7369c3e45077502b74664d8101f99ab3
$ git cat-file -p v1.2.0
object bd9ca0baff88107e26915cdaaf9821dc70a187e3
type commit
tag v1.2.0
tagger ... [snipped]
$ git rev-parse v1.2.0^{}
bd9ca0baff88107e26915cdaaf9821dc70a187e3
$  git cat-file -p v1.2.0^{}
tree 933b7a642528ce47ef2f538a007c0e48cc448e1f
parent 4bbdfab766782dc06ac496730e3a578bd35d67c5
author Junio C Hamano ... [snipped]

Because tags can point to any of Git's internal objects, if you want to require that the tag point to a commit, you can tell Git that it must do so:

$ git rev-parse v1.2.0^{commit}
bd9ca0baff88107e26915cdaaf9821dc70a187e3

Note how we got the commit's hash ID, not the annotated tag's, here. We could tell Git that we want the hash ID of a blob, and we'd get an error:

$ git rev-parse v1.2.0^{blob}
error: v1.2.0^{blob}: expected blob type, but the object dereferences to tree type
... [snipped]

There's an oddity here: we got a complaint about finding a tree type, not a commit type. That's a whole separate discussion, about what Git calls a "commit-ish" or a "tree-ish", and is not really relevant to the original question, so we'll leave that aside at this point.

In any case, anything^0 is literally just a synonym for the slightly longer anything^{commit}. You can use these interchangeably, as far as Git is concerned. Either suffix will force Git to resolve an annotated tag object to a commit object hash ID, and if Git can't do that for some reason, Git will produce an error, complaining about "expected commit type", similar to our blob error above.


3Tags, lightweight or annotated, are technically allowed to point to any of Git's internal objects. There are four types of objects: blobs, trees, commits, and annotated tags. A tag-name that points to an annotated tag object is, by definition, an annotated tag; any tag-name that points to any of the other three types of object is, by definition, a lightweight tag. The annotated tag object then points to any type of object, possibly even some other annotated tag object. Eventually, however, all annotated tag object chains are required to point to one of the other three kinds of objects. This allows Git to chase down the final target object of the annotated tag, and that object, whatever its type is, is the "target type" of the annotated tag to which the tag name points. (Whew!) There is, as far as I know, no use whatsoever for this complicated tag-to-tag-to-tag-to-tag-to-commit (or whatever) sequence, but it's easy to create, and Git allows it. The process of following an annotated tag to its ultimate object, whatever that is, is called peeling the tag, and the final target object type is what we care about. It's almost always a commit.


With all that in mind, we can now explain the suffix notations

Any name in Git will be resolved to a hash ID if and as needed. If you use the git rev-parse command to do this, you can supply an exact spelling. Some other commands do some guessing, and some allow you to type in exact spellings, passing them along to git rev-parse or the built-in equivalent. In these cases:

  • A branch name like master or develop, which is short for the full name refs/heads/master or refs/heads/develop, always points to a commit hash ID, and can always be resolved to a commit hash ID.

  • A tag name like v1.2, which is short for the full name refs/tags/v1.2, points to some kind of object. If it's a lightweight tag, it points directly to whatever kind of object you tagged, and the name will resolve to that hash ID. But if it's an annotated tag, it will, in general, resolve to the hash ID of the underlying annotated tag object. (Some commands will automatically peel the tag for you; others won't.)

  • So, if you do use a tag name, and you want to be absolutely sure that Git finds a commit at the end of it, you can add ^{commit} or ^0 to the name. There's no need to do this if you are using a branch name because a branch name can't point to anything but a commit.

There are other cases where there's no need for this either. It's mostly annotated tags that introduce this kind of need. That's what the ^0 suffix is all about: forcing Git to peel a tag, and go to its commit (and error out if it's not a commit).

The other two suffix forms you asked about are ~number and ^number. These two forms work with the arcs (backwards-pointing arrows) coming out of commits. If some name locates some commit, that commit exists within the commit graph: those drawings like:

          I--J
         /    \₁
...--G--H      M   <-- br1 (HEAD)
         \    /²
          K--L   <-- br2

Here, the name br1 selects commit M. If we want commit J, we can follow one step back, along the first parent of M, to get to commit J, using br1^1. If we want commit L, we can follow one step back, along the second parent of M, with br1^2. (Or we could just use the name br2, but that doesn't have any of these fancy suffixes. )

The hat suffix with a number exists specifically for working with merge commits. Since M is a merge commit, we can use the number to pick which parent to follow. If you leave the number out, the hat suffix defaults to going to the first parent, so br1^ means the same as br1^1, which means go one step back along the first parent which gets you commit J.

If you want to go more than one step back, you can add more hat suffix characters: br1^^ goes back one step twice. First we land at J, then we land at I. So this finds commit I. If we want commit K, we can use br1^2^: the ^2 steps back to L, and then the second ^ steps back one to L.

Suppose we want to go four steps back, to G. Then br1^^^^ will do the trick: we start at M, use the first ^ (and the implied 1) to get to J, use another ^ to get to I, use the third ^ to get to H, and use the fourth ^ to get to G. This requires typing in a carefully counted set of hat characters, though. It's nicer to be able to write br1~4. The ~4 suffix means step back four times, just like ^^^^.

With the ~ suffix, we can't put in any parent number at any point. The going-back operation only goes back first-parents. As long as we don't have any merge commits in the way, or want to go back along the first-parent line, this is just fine! We only need to use the ^ suffix, with a number, when we hit a merge commit and want to go back along something other than the first-parent line.

tag/version-0.8.7~2

This will use tag/version-0.8.7 to find some object. Since there's a ~2 suffix on the end, this object will need to be a commit object; if it's an annotated tag object, Git will "peel" the tag and make sure there is in fact a commit there. Then, from that point, it will go back two first-parent links.

tag/version-0.8.7~2^0

This does the same thing, but after going back two first-parent links, requires (via the ^0 suffix) that the hash ID found be that of a commit. Since commits are only allowed to point back to previous commits, the ^0 suffix here is unnecessary. It's not harmful but it adds nothing that the ~2 did not already enforce.

tag/version-0.7.6~10^2~2

This starts from whatever tag/version-0.7.6 identifies. Then it has to be a commit, because of the ~10 suffix, so it goes through the same process as we just saw. Then the ~10 suffix steps back ten first-parent links. Now there's a ^2 suffix, so we need to hit a merge commit at this point, and we step back one second-parent link. (This is, as always for backward links from commits, another commit.) Then we step back two more first-parent links, and that finishes off all the suffix handling and we have our final commit hash ID.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Hi Torek, I really appreciate you taking the time and effort to provide such a thorough and in-depth explanation. Absolutely love the graph you draw. It makes the ideal so much more clear. This really helps me in the long run! Especially thank you for letting me know about the graph theory involved in git. And I sure love to visit Puni Distillery in Italy once the covid dies out (Looks like Minecraft building in real life) – DavidKanes Dec 18 '20 at 19:49