3

I'm trying to determine if a or git commit is needed in a git repository. I come up with this code that works fine for me :

def is_dirty(repo):
  import pygit2
  status = repo.status()
  for filepath, flags in status.items():
    if flags != pygit2.GIT_STATUS_CURRENT:
        if flags != 16384:
            return True
  return False;

But this is extremely inefficient : the repo.status() takes forever -- at least compared to the git status command line.

So my question is : is there a more efficient way to know if the repository is clean ?

PS : I'm using python3. With python2 I used the module git that has a is_dirty function.

Laurent Claessens
  • 547
  • 1
  • 3
  • 18

3 Answers3

4

5 years later. Here is what I do now

pip3 install GitPython

and then

import git
repo = git.Repo("path/to/my/repo")
if repo.is_dirty(untracked_files=True):
   do_work()
Laurent Claessens
  • 547
  • 1
  • 3
  • 18
1

The Repository.status() performance performed well in a quick test.

Typical usage:

import pygit2
from pathlib import Path
from typing import Dict, Union
repo_path: Union[Path, str] = Path("/path/to/your/repo")
repo = pygit2.Repository(pygit2.discover_repository(repo_path))
status: Dict[str, int] = repo.status()
print(status)
# {} (nothing to commit, working tree clean)
# {"myfile" : 256} (myfile is modified but not added)

My version of the function to remove the files with changed filemodes (code 16384, GIT_FILEMODE_TREE).

def get_repo_status(repo: pygit2.Repository) -> Dict[str, int]:
    # get the integer representing filemode changes
    changed_filemode_status_code: int = pygit2.GIT_FILEMODE_TREE
    original_status_dict: Dict[str, int] = repo.status()
    # transfer any non-filemode changes to a new dictionary
    status_dict: Dict[str, int] = {}
    for filename, code in original_status_dict.items():
        if code != changed_filemode_status_code:
            status_dict[filename] = code
    return status_dict

Performance:

%timeit repo.status()
# 2.23 ms per loop
%timeit get_repo_status(repo)
# 2.28 ms per loop
%timeit subprocess.run(["git", "status"]) # yes, I know, this is not really comparable..
# 11.3 ms per loop
Mark Teese
  • 651
  • 5
  • 16
0

really dirty solution? python: exec a shell script with git status

sh:
VAR= $(git status)
dump var to file 

python:
meanwhile your python script is waiting for the file to be created
while(1)
if(os.path.exists(file_path))
status = read_file_function(file_path)
break 

stupid and simple, and probably quite obvious

quitemew
  • 9
  • 2
  • I can also do `commands.getoutput("git status")`. But I'll do something like that when I'll be really ... well you know ... – Laurent Claessens Nov 15 '15 at 04:04
  • tres bien, want another terrible/beautiful idea? have a daemon running in the background, generating a csv or plain text file with the status of each of your repos, you can then get the status of any repo at 'zero' cost...well.. at least your 'get_status(repo_name)' will be zero cost.... – quitemew Nov 15 '15 at 04:22