Tracking revision numbers with git

Question

Subversion has a revision id that is incremented after each commit. We used this to include it in the version number of each release which is in the format X.Y.Z where X is the major version, Y is the minor version and Z is the revision number.

In our issue tracker we would just reference subversion revision numbers (or reference the issue number in the commit message) and it was easy to determine whether a particular version already contained the fix or not.

Now with git commits are identified by a hash. Since this cannot be used as a revision number we use the commit count instead that yields the same thing in order to generate the version number during the build.

Now the problem is that when a bug is reported by a user the bug report normally includes the version number and it is really hard to look up whether this is something that's been fixed in a more recent version or is still unresolved because with git all we see is a commit hash.

One solution would be to maintain a translation table that lists each commit hash and maps it to a revision number but this makes life much harder.

Can you recommend any best practices for this problem?

score 1 · Answer 1 · answered Nov 18 '16 at 11:46

I handle this in a very simple way using git describe. It conveniently packages 3 important pieces of information:

The hash
The latest Tag
The number of commits since the latest tag, in case we are on an untagged commit.

Furthermore, in most projects I have a standard way of tagging releases: vXXX.YYY.ZZZ. I use the output of git describe everywhere I need an exact reference to a commit. For example, one of my projects is at:

v1.1.9-19-g3024adf

I usually run a pre-compilation script that injects this in some compiler symbols to include in the binary. Having a standard way of naming my tags ensures I get a upper-bound length on the output of git describe, which is important for me because I need to squeeze that in whatever protocol I include in my embedded systems.

score 0 · Answer 2 · answered Nov 18 '16 at 11:27

0

Don't use the commit count. Simply include the first few characters of the hash in lieu of the old version number. You don't need to include the whole string, the first five or six characters will be enough.

Version numbers don't make sense in a distributed context because the history is eminently not linear. What is commit 10 for you might be an entirely different commit on someone else's clone.

answered Nov 18 '16 at 11:27

s.m.

7,895
2
38
46

The release is linear. We go from 1.0.42 to 1.0.43. We need to be able to easily tell whether a fix that a developer made in his remote branch was already merged and released in 1.0.43 or not. – b0ti Nov 18 '16 at 12:57
@b0ti and how is this "fix that a developer made" identified? Do you have the hash? It's not very clear. If you are looking for a way to check whether a given commit made it to your release branch, you can do it like [this](http://stackoverflow.com/questions/8475448/find-merge-commit-which-include-a-specific-commit). – s.m. Nov 18 '16 at 13:24
The developer adds a comment "Fixed in " to the ticket in the issue tracker (or this gets automatically added from the commit message referencing the ticket). – b0ti Nov 18 '16 at 17:54

Marcus Müller · Answer 3 · 2016-11-18T11:34:25.403

0

So, there's the conceptual problem that (while SVN makes that possible, it's a lot more handwork) git emphases on different branches being merged.

so let's assume

     /--> B1 --> B2 --> … --> B18-\
A -->                              +--> D
     \--> C1 --> C2 --------------/

What version number should D have? Is it version(A) + 19 (upper path) or version(A) + 3 (lower path)? Or do you count the merge as revision (+1 count)?

So, even in SVN times, your monotonous revisioning was basically but a convention, and you probably didn't really work on branches other than trunk if from that number you could see whether a fix was there or not.

That mono-branched scheme makes no sense for modern development in a team or with a system that allows you to build features without having to fumble with your bugfixes in another branch. So, being but a convention to declare one branch as the "versioned" branch, it's usual to simply have a "master" branch (which is the default branch in git), in which all feature branches are merged as soon as they work, and from which new feature branches are forked off, whenever someone feels like working on a new feature. Then, you'd just git tag commits on your master branch whenever something significant happened – a new release, for example. Typical tag names are release_001_002_001. Yes, it's manual, compared to the automatic revision counting on SVN, but it's unlike that, actually useful for your code management – looking up whether a certain bugfix commit hash happened before or after another commit hash is simply a question of git log.

You can actually just count the commits between A and D. Then, version(D) would be version(A) + 18 + 2 + 1. That's relatively doable; you'd

git log A..D --pretty=oneline | wc -l

Again, I doubt the usefulness of that.

edited Nov 18 '16 at 11:34

answered Nov 18 '16 at 11:28

Marcus Müller

34,677
4
53
94

We are already using the commit count `wc -l`. Tags don't help because that's applied when a release is done and not when a developer fixes an issue. I do understand the concept behind git's distributed branching but the question was about something different. – b0ti Nov 18 '16 at 12:47
well, then you don't fully understand the answer, or you didn't think this through: the way version control with git works, there cannot be linear numbers, because there is no linear history. So you're chasing something that is impossible. – Marcus Müller Nov 18 '16 at 13:37
and I heartily contradict your "a customer calls and it's hard to look up whether the issue has been fixed". It's not. In both cases, it's looking into your revision history. – Marcus Müller Nov 18 '16 at 13:38
and you need to know the revision id (SVN) or the commit hash (git) of the bugfix, either way, so it's simply a matter of looking up whether that is part of the history. – Marcus Müller Nov 18 '16 at 13:39
PLUS: you either need to stop giving customers non-release versions directly from source control, or you need to deal with the fact that they will have strange versions of software. You can't eat the cake AND have it. – Marcus Müller Nov 18 '16 at 13:40
1

@MarcusMüller is correct, you cannot use a linear count to describe a nonlinear process. SVN is able to do this because it specifically *linearizes the process*, by assigning a unique sequential number to each new commit. This is possible because commits are *only* stored on the central server. There is a single source of truth. Now, there *is* a way to do something similar with Git, but there is no space in the comment, so I must provide a separate answer. :-) – torek Nov 18 '16 at 21:16

score 0 · Answer 4 · edited May 23 '17 at 10:30

As I said in a comment, the problem here boils down to linearizing. If you want a simple incrementing count to specify some particular commit, you must have a single source point that makes this simple incrementing count.

In SVN, there is an obvious place to do this: all commits are stored on a master central server. In order to make a new commit, you call up the central server and say: make a new commit. This either succeeds—and can get a simple, incrementing number—or it fails and there is no commit.

In Git, there is no designated central server. Each developer makes his or her own commits. Commits are exchanged between peers. The globally unique identifier for any given commit is its hash: Git guarantees that no two commits ever have the same hash.¹

The lack of a single central counting point destroys the usefulness of making your own simple revision count, as different repositories can and will have the same number of commits without containing the same set of commits. I may have 17 commits, of which 2 are different from your 17 commits, so that if we combine our two repositories, we both wind up with 19 commits. (If I combine yours with mine, I get 19 commits—two new ones I get from you, plus the 15 we already shared—while you still have 17: you must still pick up the two commits I have that you lack.)

You can, however, use your idea: simply designate a central counting point:

One solution would be to maintain a translation table that lists each commit hash and maps it to a revision number but this makes life much harder.

It's not that much harder if you already have a central server. For instance, if any release build is done on the "release-build" system, and the release-build system has a Git repository, you simply designate its repository as the central counting point.

It maintains the table. The count could be the number of commits in its repository.² But that's more than we need: The count can simply be the number of entries in the table; there is no need to count non-built releases. In any case, the translation from "count" to "hash", or vice versa, is done by looking up or adding the appropriate entry into the table.

The value of this simplified count is dubious at best. Look at real software releases, which are usually tagged with a "dotted version": Git version 2.8.4, Git version 2.9.0, Git version 2.10.1; Python 2.7.12, Python 3.4.5, and so on. How does 7.3.12 compare to 7.4.0? Is it strictly "less than", or not? With Git, when you build releases, you can tag them with dotted versions like this. The tag can be distributed using Git's built-in mechanisms, and everyone can look up v7.3.12 locally and find the commit. If you do not have the tag, you probably do not have the version: you must git fetch, perhaps with --tags, from someone who does.

The tags are, in effect, a distributed version of this central mapping table. Instead of counting the tags, though, we simply use their names, which have the form vX or vX.Y or whatever.

These tags can be extended with git describe, which lets you say "this many commits distant from this fixed tag, plus a unique verifier/locator in case distributed builds make the relative count break." See Sébastien Dawans' answer.

¹This "guarantee" is kept via a simple mechanism: if two commits do have the same hash, Git simply refuses to believe that the second one exists. It won't accept it, it won't store it into the repository, and the existing hash "wins". The chances of this happening for any given pair of objects is vanishingly small: one out of 2^N, where N is the number of bits in the hash. Since Git uses SHA-1 which is 160 bits, that's 2^-160.

Due to the so-called birthday paradox or birthday problem, the probability rises rapidly with the number of objects. However, we start from such a small base that we can have trillions of objects, perhaps as many as 1.7 quadrillion or so, before the chance even rises to the same level as the chance of undetected storage-media corruption. (The names here use the "short scale"; see https://en.wikipedia.org/wiki/Quadrillion.)

²If you do use this approach (counting the number of commits in its repository), you must make sure you never drop any commits, or the count would go down and hence not act like an ascending function. This is one reason a count of table entries might be better; or you could use a separate counter that you never reset, with an atomic fetch-and-increment when choosing the next number.

Tracking revision numbers with git

4 Answers4