28

I am looking to get only the diff of a file changed from a git repo. Right now, I am using gitpython to actually get the commit objects and the files of git changes, but I want to do a dependency analysis on only the parts of the file changed. Is there any way to get the git diff from git python? Or am I going to have to compare each of the files by reading line by line?

Ryan M
  • 18,333
  • 31
  • 67
  • 74
user1816561
  • 319
  • 2
  • 4
  • 4

9 Answers9

23

If you want to access the contents of the diff, try this:

repo = git.Repo(repo_root.as_posix())
commit_dev = repo.commit("dev")
commit_origin_dev = repo.commit("origin/dev")
diff_index = commit_origin_dev.diff(commit_dev)

for diff_item in diff_index.iter_change_type('M'):
    print("A blob:\n{}".format(diff_item.a_blob.data_stream.read().decode('utf-8')))
    print("B blob:\n{}".format(diff_item.b_blob.data_stream.read().decode('utf-8'))) 

This will print the contents of each file.

Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
D. A.
  • 3,369
  • 3
  • 31
  • 34
  • 4
    Excellent. This is the way to do it with the GitPython API, versus delegating directly to the Git CLI like [Cairo's answer](https://stackoverflow.com/a/23320050/241211) does. – Michael Dec 11 '18 at 21:31
  • What about using this code with different branches? I dont want to be fixed on one branch (dev in your case) diff getting. – rRr Apr 25 '23 at 21:55
  • you are awesoooooooome I was searching for this solution for 1 month!!! to make it more readable I shrink it to " diff = repo.commit(head1).diff(head2) " – Reflection Jun 01 '23 at 08:38
19

You can use GitPython with the git command "diff", just need to use the "tree" object of each commit or the branch for that you want to see the diffs, for example:

repo = Repo('/git/repository')
t = repo.head.commit.tree
repo.git.diff(t)

This will print "all" the diffs for all files included in this commit, so if you want each one you must iterate over them.

With the actual branch it's:

repo.git.diff('HEAD~1')

Hope this help, regards.

Cairo
  • 495
  • 9
  • 19
  • How can I find out the `diff` between the `tree` and `head~1`. They have similarities, but the latter has more diff entries. Seems that the latter includes the diff to the last commit. – Timo Jul 17 '21 at 19:26
6

Git does not store the diffs, as you have noticed. Given two blobs (before and after a change), you can use Python's difflib module to compare the data.

Max Alibaev
  • 681
  • 7
  • 17
Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
  • The repo I am working on only has one master branch. If I am trying to get two blobs, how do i get a second one to compare the before change? – user1816561 Nov 19 '13 at 16:15
  • Sorry, just another question. So beyond trying to get both the a blob and b blob and understanding what those are, will those blobs actually give me the content of the file changed? – user1816561 Nov 19 '13 at 17:00
  • @user1816561 if you're referring two blobs as in the diff of your working tree vs. the most recent commit, you should use `repo.head.commit.diff(None)` – Addison Klinke Feb 24 '21 at 21:44
  • This is the `python3` version of the [link](https://docs.python.org/3/library/difflib.html) – Timo Jul 17 '21 at 18:54
3

I'd suggest you to use PyDriller instead (it uses GitPython internally). Much easier to use:

for commit in Repository("path_to_repo").traverse_commits():
    for modified_file in commit.modified_files: # here you have the list of modified files
        print(modified_file.diff)
        # etc...

You can also analyze a single commit by doing:

for commit in RepositoryMining("path_to_repo", single="123213")
Davide Spadini
  • 271
  • 2
  • 7
3

If you're looking to recreate something close to what a standard git diff would show, try:

# cloned_repo = git.Repo.clone_from(
#     url=ssh_url,
#     to_path=repo_dir,
#     env={"GIT_SSH_COMMAND": "ssh -i " + SSH_KEY},
# ) 
for diff_item in cloned_repo.index.diff(None, create_patch=True):
    repo_diff += (
        f"--- a/{diff_item.a_blob.name}\n+++ b/{diff_item.b_blob.name}\n"
        f"{diff_item.diff.decode('utf-8')}\n\n"
        )
Adriaan
  • 17,741
  • 7
  • 42
  • 75
ZaxR
  • 4,896
  • 4
  • 23
  • 42
1

If you want to do git diff on a file between two commits this is the way to do it:

import git
   
repo = git.Repo()
path_to_a_file = "diff_this_file_across_commits.txt"
   
commits_touching_path = list(repo.iter_commits(paths=path))
   
print repo.git.diff(commits_touching_path[0], commits_touching_path[1], path_to_a_file)

This will show you the differences between two latest commits that were done to the file you specify.

Adriaan
  • 17,741
  • 7
  • 42
  • 75
Nikola Đuza
  • 455
  • 5
  • 11
0
repo.git.diff("main", "head~5")
Adriaan
  • 17,741
  • 7
  • 42
  • 75
  • 1
    Welcome to Stack Overflow! Please read [ask] and [edit] your question to contain an explanation as to why this code would actually solve the problem at hand. Always remember that you're not only solving the problem, but are also educating the OP and any future readers of this post. – Adriaan May 24 '22 at 06:40
  • result: @@ -97,6 +97,25 @@ + + + + - org.codehaus.mojo - findbugs-maven-plugin - 3.0.5 - - Low - Medium – 土豆先生 May 24 '22 at 11:08
  • If you want to update, [edit], don't comment. – General Grievance May 24 '22 at 12:13
0

PyDriller +1

pip install pydriller

But with the new API:

Breaking API: ```
from pydriller import Repository

for commit in Repository('https://github.com/ishepard/pydriller').traverse_commits():
    print(commit.hash)
    print(commit.msg)
    print(commit.author.name)

    for file in commit.modified_files:
        print(file.filename, ' has changed')
K. Symbol
  • 3,330
  • 1
  • 21
  • 22
-2

Here is how you do it

import git
repo = git.Repo("path/of/repo/")

# the below gives us all commits
repo.commits()

# take the first and last commit

a_commit = repo.commits()[0]
b_commit = repo.commits()[1]

# now get the diff
repo.diff(a_commit,b_commit)
Adriaan
  • 17,741
  • 7
  • 42
  • 75
Ciasto piekarz
  • 7,853
  • 18
  • 101
  • 197
  • 7
    This code does not work. `AttributeError: 'Repo' object has no attribute 'diff'` and `AttributeError: 'Repo' object has no attribute 'commits'` – firelynx Aug 16 '18 at 10:40