270

I would like to include the current git hash in the output of a Python script (as a the version number of the code that generated that output).

How can I access the current git hash in my Python script?

the
  • 21,007
  • 11
  • 68
  • 101
Victor
  • 23,172
  • 30
  • 86
  • 125

12 Answers12

309

No need to hack around getting data from the git command yourself. GitPython is a very nice way to do this and a lot of other git stuff. It even has "best effort" support for Windows.

After pip install gitpython you can do

import git
repo = git.Repo(search_parent_directories=True)
sha = repo.head.object.hexsha

Something to consider when using this library. The following is taken from gitpython.readthedocs.io

Leakage of System Resources

GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.

In case you still want to use it in such a context, you will want to search the codebase for __del__ implementations and call these yourself when you see fit.

Another way assure proper cleanup of resources is to factor out GitPython into a separate process which can be dropped periodically

Lioness100
  • 8,260
  • 6
  • 18
  • 49
the
  • 21,007
  • 11
  • 68
  • 101
  • 13
    @crishoj Not sure how you can call it portable when this happens: `ImportError: No module named gitpython`. You cannot rely on the end user having `gitpython` installed, and requiring them to install it before your code works makes it not portable. Unless you are going to include automatic installation protocols, at which point it is no longer a clean solution. – user5359531 May 26 '17 at 17:13
  • 68
    @user5359531 I beg to differ. GitPython provides a pure Python implementation, abstracting away platform-specific details, and it is installable using standard package tools (`pip` / `requirements.txt`) on all platforms. What's not "clean"? – crishoj May 27 '17 at 03:23
  • 6
    `pip` is not available on all systems. For that matter, neither is the external internet access needed by `pip` to install said packages. – user5359531 May 27 '17 at 21:14
  • 30
    This is the normal way to do things in Python. If the OP needs those requirements, then they would have said so. We are not mind-readers, we can't predict every eventuality in each question. That way lies madness. – OldTinfoil Jan 31 '18 at 11:16
  • 35
    @user5359531, I am unclear why `import numpy as np` can be assumed throughout the whole of stackoverflow but installing gitpython is beyond 'clean' and 'portable'. I think this is by far the best solution, because it does not reinvent the wheel, hides away the ugly implementation and does not go around hacking the answer of git from subprocess. – Jblasco Apr 10 '18 at 13:36
  • 1
    If its not in the standard library, its not 'portable'. Numpy is no exception. `subprocess` is a standard method for interacting with CLI programs from within Python. Installing 3rd party libraries as a crux to solve every simple problem in Python is not a great practice and causes issues the moment you need to run your code on any other system. If you want to hide the 'ugly implementation', then use a function. If the code is never going to be run by anyone or anywhere else, then of course use whatever solution you like. – user5359531 Apr 10 '18 at 19:44
  • 9
    @user5359531 While I agree in general that you shouldn't throw a shiny new library at every small problem, your definition of "portability" seems to neglect modern scenarios where developers have full control over all environments said applications run in. In 2018 we have Docker containers, virtual environments, and machine images (e.g. AMIs) with `pip` or the ability to easily install `pip`. In these modern scenarios, a `pip` solution is just as portable as a "standard library" solution. – Ryan Aug 01 '18 at 23:29
  • 1
    @Ryan thanks but I am a developer and have no control over my development environment, Docker is banned on company hardware due to security concerns, need to design & run programs on ancient servers, devs lacks admin rights to server, etc. etc... The year might be 2018 now but plenty of systems out there havent been updated since 2012 or earlier and not all devs have these luxuries you describe. Virtualenv also has compatibility issues between different Python versions. – user5359531 Aug 03 '18 at 17:35
  • 5
    > _If its not in the standard library, its not 'portable'._ I'm sorry, but that does not make any sense at all. Using a language implementation (package) that abstracts the programmer from the platform (s)he is using is by far way more portable than calling subprocesses that relies in the underlying platform and in the existence of such exact commands, which can be different in Mac, Linux, Windows and BSDs. By definition, using abstraction interfaces is the very meaning of portable, while calling subcommands from your programs is absolutely not. – José L. Patiño Mar 26 '19 at 16:11
  • 2
    Not being able to import libraries is the pathological case, not the common case. The common case in the vast majority of programming is using libraries - particularly to avoid clunky interaction with external programs. If you can't you should use subprocess, but failing that or another compelling reason this is the best-practice solution: use a battle-hardened library built to handle the use case in question. – Nathaniel Ford Nov 04 '19 at 17:57
  • Agree with both opinion. I went into subprocess because GitPython needs Python > 3.4 and I'm still using Python 2.7. Maybe will use GitPython later... – HamzDiou Sep 03 '20 at 10:36
  • I have tested subprocess on my local and when deploying crashed dev machine ! I better understand the matter now and will get back with raven that allow simply that on Python 2.7 : import raven -> raven.fetch_git_sha(BASE_DIR) – HamzDiou Sep 07 '20 at 09:23
178

This post contains the command, Greg's answer contains the subprocess command.

import subprocess

def get_git_revision_hash() -> str:
    return subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()

def get_git_revision_short_hash() -> str:
    return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()

when running

print(get_git_revision_hash())
print(get_git_revision_short_hash())

you get output:

fd1cd173fc834f62fa7db3034efc5b8e0f3b43fe
fd1cd17
Charlie Parker
  • 5,884
  • 57
  • 198
  • 323
Yuji 'Tomita' Tomita
  • 115,817
  • 29
  • 282
  • 245
  • 41
    Add a strip() to the result to get this without line breaks :) – grasshopper Sep 02 '14 at 05:30
  • How would you run this for a git repo at a particular path? – pkamb Feb 26 '15 at 01:00
  • 2
    @pkamb Use os.chdir to cd to the path of the git repo you are interested in working with – Zac Crites Jul 27 '15 at 15:52
  • Wouldn't that give the wrong answer if the currently checked out revision is not the branch head? – max Feb 01 '16 at 06:18
  • 1
    and `subprocess.check_output(['git', 'rev-parse', '--abbrev-ref', 'HEAD'])` for the branch name – Ryan Allen May 23 '18 at 20:49
  • 13
    Add a `.decode('ascii').strip()` to decode the binary string (and remove the line break). – pfm Nov 09 '18 at 09:14
  • Or add [`universal_newlines=True`](https://docs.python.org/3/library/subprocess.html#frequently-used-arguments) to get a string. – z0r Apr 26 '20 at 11:44
  • This avoids the resource leaks mentioned in gitpython and this is pretty clean 2-line def to get the hash. I like it. – Wayne Workman Feb 20 '21 at 14:48
  • 3
    If your code is ran from another directory, you might want to add `cwd=os.path.dirname(os.path.realpath(__file__))` as a parameter for `check_output` – Kipr Oct 05 '21 at 07:16
123

The git describe command is a good way of creating a human-presentable "version number" of the code. From the examples in the documentation:

With something like git.git current tree, I get:

[torvalds@g5 git]$ git describe parent
v1.0.4-14-g2414721

i.e. the current head of my "parent" branch is based on v1.0.4, but since it has a few commits on top of that, describe has added the number of additional commits ("14") and an abbreviated object name for the commit itself ("2414721") at the end.

From within Python, you can do something like the following:

import subprocess
label = subprocess.check_output(["git", "describe"]).strip()
Community
  • 1
  • 1
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • 7
    This has the drawback that the version printing code will be broken if the code is ever run without the git repo present. For example, in production. :) – JosefAssad Feb 20 '13 at 21:18
  • 7
    @JosefAssad: If you need a version identifier in production, then your deployment procedure should run the above code and the result should be "baked in" to the code deployed to production. – Greg Hewgill Feb 20 '13 at 21:20
  • That's only 1 way to accomplish that; there's other ways. git attributes can be used to inject version information upon checkout. Even though git attributes are not transferred to clones, as long as they're defined in the master copy and the ops take their code from master, this would be a simpler solution. – JosefAssad Feb 20 '13 at 21:22
  • 22
    Note that git describe will fail if there are not tags present: `fatal: No names found, cannot describe anything.` – kynan Sep 26 '14 at 10:57
  • 59
    `git describe --always` will fallback to the last commit if no tags are found – Leonardo Mar 06 '15 at 16:38
  • Does this work if the script is somewhere in my $PATH variable - but I'm running it from somewhere else in the filesystem?? – Christian Herenz Jan 22 '16 at 18:17
  • To get a format like above: `--` I had to use `git describe --long --tags` – djangonaut Feb 18 '16 at 20:12
  • 1
    didn't work: `>>> label = subprocess.check_output(["git", "describe"]) fatal: No names found, cannot describe anything. Traceback (most recent call last): File "", line 1, in File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 573, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['git', 'describe']' returned non-zero exit status 128` – Charlie Parker Jul 01 '16 at 20:31
  • 5
    @CharlieParker: `git describe` normally requires at least one tag. If you don't have any tags, use the `--always` option. See the [git describe documentation](https://git-scm.com/docs/git-describe) for more information. – Greg Hewgill Jul 01 '16 at 20:56
  • `fatal: Not a valid object name parent` – Dims Oct 08 '18 at 20:07
  • @Dims: The use of the branch name `parent` is an example provided in the documentation. You would use your own branch name there. – Greg Hewgill Oct 08 '18 at 23:49
  • @GregHewgill then it should be specified `` or emphasised in text – Dims Dec 20 '18 at 18:56
  • Doesn't work if there's no tag, as opposed to Yuji 'Tomita' Tomita's answer below, which exactly provides what the question asks for. – Syed Priom May 03 '19 at 15:58
  • If you always want the hash, `git describe --always` is no good because it returns the annotated tag if one exists – RobinL Oct 14 '20 at 13:38
18

Here's a more complete version of Greg's answer:

import subprocess
print(subprocess.check_output(["git", "describe", "--always"]).strip().decode())

Or, if the script is being called from outside the repo:

import subprocess, os
print(subprocess.check_output(["git", "describe", "--always"], cwd=os.path.dirname(os.path.abspath(__file__))).strip().decode())

Or, if the script is being called from outside the repo and you like pathlib:

import subprocess
from pathlib import Path
print(subprocess.check_output(["git", "describe", "--always"], cwd=Path(__file__).resolve().parent).strip().decode())
mathandy
  • 1,892
  • 25
  • 32
  • 5
    Instead of using `os.chdir`, the `cwd=` arg can be used in `check_output` to temporary changes the working directory before executing. – Marc Sep 19 '19 at 15:22
  • Thank you for including the case where the script is called from outside the repo. That just bit me. – John Dec 02 '21 at 20:43
15

numpy has a nice looking multi-platform routine in its setup.py:

import os
import subprocess

# Return the git revision as a string
def git_version():
    def _minimal_ext_cmd(cmd):
        # construct minimal environment
        env = {}
        for k in ['SYSTEMROOT', 'PATH']:
            v = os.environ.get(k)
            if v is not None:
                env[k] = v
        # LANGUAGE is used on win32
        env['LANGUAGE'] = 'C'
        env['LANG'] = 'C'
        env['LC_ALL'] = 'C'
        out = subprocess.Popen(cmd, stdout = subprocess.PIPE, env=env).communicate()[0]
        return out

    try:
        out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
        GIT_REVISION = out.strip().decode('ascii')
    except OSError:
        GIT_REVISION = "Unknown"

    return GIT_REVISION
ryanjdillon
  • 17,658
  • 9
  • 85
  • 110
  • Yuji's [answer](https://stackoverflow.com/a/21901260/1265192) provides a similar solution in only one line of code that produces the same result. Can you explain why `numpy` found it necessary to "construct a minimal environment"? (assuming they had good reason to) – MD004 Jul 18 '18 at 20:39
  • I just noticed this in their repo, and decided to add it to this question for folks interested. I don't develop in Windows, so I haven't tested this, but I had assumed that setting up the `env` dict was necessary for cross-platform functionality. Yuji's answer does not, but perhaps that works on both UNIX and Windows. – ryanjdillon Aug 05 '18 at 20:48
  • 1
    Looking at the git blame, they did this as a bug fix for SVN 11 years ago: https://github.com/numpy/numpy/commit/44d92ec449e7397bdefa49eb901e4bb89eafaa70 It's possible the bug fix is no longer necessary for git. – gparent Jan 22 '20 at 02:47
  • @MD004 @ryanjdillon They set the locale so that `.decode('ascii')` works - otherwise the encoding is unknown. – z0r Apr 26 '20 at 11:40
  • Is there any way to import this function and use it? I tried: ```from numpy.setup import git_version``` and it didn't work – jlansey Oct 02 '20 at 18:22
  • 1
    Being a function declared in `setup.py` , it is not part of the `numpy` package, so it isn't possible to import it from `numpy`. To use it, you would need to add this method to your own code somewhere. – ryanjdillon Oct 04 '20 at 10:10
15

If subprocess isn't portable and you don't want to install a package to do something this simple you can also do this.

import pathlib

def get_git_revision(base_path):
    git_dir = pathlib.Path(base_path) / '.git'
    with (git_dir / 'HEAD').open('r') as head:
        ref = head.readline().split(' ')[-1].strip()

    with (git_dir / ref).open('r') as git_hash:
        return git_hash.readline().strip()

I've only tested this on my repos but it seems to work pretty consistantly.

kagronick
  • 2,552
  • 1
  • 24
  • 29
7

This is an improvement of Yuji 'Tomita' Tomita answer.

import subprocess

def get_git_revision_hash():
    full_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'])
    full_hash = str(full_hash, "utf-8").strip()
    return full_hash

def get_git_revision_short_hash():
    short_hash = subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'])
    short_hash = str(short_hash, "utf-8").strip()
    return short_hash

print(get_git_revision_hash())
print(get_git_revision_short_hash())
Wayne Workman
  • 449
  • 7
  • 14
4

if you want a bit more data than the hash, you can use git-log:

import subprocess

def get_git_hash():
    return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%H']).strip()

def get_git_short_hash():
    return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h']).strip()

def get_git_short_hash_and_commit_date():
    return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h-%ad', '--date=short']).strip()

for full list of formating options - check out git log --help

Ohad Cohen
  • 5,756
  • 3
  • 39
  • 36
3

I ran across this problem and solved it by implementing this function. https://gist.github.com/NaelsonDouglas/9bc3bfa26deec7827cb87816cad88d59

from pathlib import Path

def get_commit(repo_path):
    git_folder = Path(repo_path,'.git')
    head_name = Path(git_folder, 'HEAD').read_text().split('\n')[0].split(' ')[-1]
    head_ref = Path(git_folder,head_name)
    commit = head_ref.read_text().replace('\n','')
    return commit


r = get_commit('PATH OF YOUR CLONED REPOSITORY')
print(r)
2

If you don't have Git available for some reason, but you have the git repo (.git folder is found), you can fetch the commit hash from .git/fetch/heads/[branch].

For example, I've used a following quick-and-dirty Python snippet run at the repository root to get the commit id:

git_head = '.git\\HEAD'

# Open .git\HEAD file:
with open(git_head, 'r') as git_head_file:
    # Contains e.g. ref: ref/heads/master if on "master"
    git_head_data = str(git_head_file.read())

# Open the correct file in .git\ref\heads\[branch]
git_head_ref = '.git\\%s' % git_head_data.split(' ')[1].replace('/', '\\').strip()

# Get the commit hash ([:7] used to get "--short")
with open(git_head_ref, 'r') as git_head_ref_file:
    commit_id = git_head_ref_file.read().strip()[:7]
Paolo
  • 20,112
  • 21
  • 72
  • 113
am9417
  • 994
  • 8
  • 18
  • This worked for me though I had to change the '\\' to '/'. Must be a Windows thing? – chrislondon Aug 20 '20 at 15:02
  • @Reishin I think you meant "environment-specific-coding". I think so because that would suffer less risk of being flagged for inappropriate language. (Which by the way I did not - for being too slow....) – Yunnosch Nov 09 '20 at 14:52
2

I had a problem similar to the OP, but in my case I'm delivering the source code to my client as a zip file and, although I know they will have python installed, I cannot assume they will have git. Since the OP didn't specify his operating system and if he has git installed, I think I can contribute here.

To get only the hash of the commit, Naelson Douglas's answer was perfect, but to have the tag name, I'm using the dulwich python package. It's a simplified git client in python.

After installing the package with pip install dulwich --global-option="--pure" one can do:

from dulwich import porcelain

def get_git_revision(base_path):
    return porcelain.describe(base_path)

r = get_git_revision("PATH OF YOUR REPOSITORY's ROOT FOLDER")
print(r)

I've just run this code in one repository here and it showed the output v0.1.2-1-gfb41223, similar to what is returned by git describe, meaning that I'm 1 commit after the tag v0.1.2 and the 7-digit hash of the commit is fb41223.

It has some limitations: currently it doesn't have an option to show if a repository is dirty and it always shows a 7-digit hash, but there's no need to have git installed, so one can choose the trade-off.

Edit: in case of errors in the command pip install due to the option --pure (the issue is explained here), pick one of the two possible solutions:

  1. Install Dulwich package's dependencies first: pip install urllib3 certifi && pip install dulwich --global-option="--pure"
  2. Install without the option pure: pip install dulwich. This will install some platform dependent files in your system, but it will improve the package's performance.
-1

If you are like me :

  • Multiplatform so subprocess may crash one day
  • Using Python 2.7 so GitPython not available
  • Don't want to use Numpy just for that
  • Already using Sentry (old depreciated version : raven)

Then (this will not work on shell because shell doesn't detect current file path, replace BASE_DIR by your current file path) :

import os
import raven

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
print(raven.fetch_git_sha(BASE_DIR))

That's it.

I was looking for another solution because I wanted to migrate to sentry_sdk and leave raven but maybe some of you want to continue using raven for a while.

Here was the discussion that get me into this stackoverflow issue

So using the code of raven without raven is also possible (see discussion) :

from __future__ import absolute_import

import os.path

__all__ = 'fetch_git_sha'


def fetch_git_sha(path, head=None):
    """
    >>> fetch_git_sha(os.path.dirname(__file__))
    """
    if not head:
        head_path = os.path.join(path, '.git', 'HEAD')

        with open(head_path, 'r') as fp:
            head = fp.read().strip()

        if head.startswith('ref: '):
            head = head[5:]
            revision_file = os.path.join(
                path, '.git', *head.split('/')
            )
        else:
            return head
    else:
        revision_file = os.path.join(path, '.git', 'refs', 'heads', head)

    if not os.path.exists(revision_file):
        # Check for Raven .git/packed-refs' file since a `git gc` may have run
        # https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery
        packed_file = os.path.join(path, '.git', 'packed-refs')
        if os.path.exists(packed_file):
            with open(packed_file) as fh:
                for line in fh:
                    line = line.rstrip()
                    if line and line[:1] not in ('#', '^'):
                        try:
                            revision, ref = line.split(' ', 1)
                        except ValueError:
                            continue
                        if ref == head:
                            return revision

    with open(revision_file) as fh:
        return fh.read().strip()

I named this file versioning.py and I import "fetch_git_sha" where I need it passing file path as argument.

Hope it will help some of you ;)

HamzDiou
  • 588
  • 9
  • 15