... how to get just the changed lines
This question is incomplete. Suppose I tell you that there are some people, including Alice, Bob, Carol, and so on. Now I tell you that Bob is different. Different from who or what?
In a pre-receive hook, you must read lines from your standard input. Each line has the form:
old-hash new-hash reference-name
What do these mean? (That's an exercise for you to answer before you go on to the next sections, though the answer is embedded in the last section below.)
Obtaining a diff requires that you select two items
A commit is a snapshot of files—complete copies of every file that was frozen into that commit. There are no differences involved; there are just complete files.
You, however, want differences. To get a difference for some file file.ext
, you must pick some other version of file.ext
and compare the two. What is the correct "other version"?
For some commits, you are in luck: there's a very clear correct "other version" of file.ext
, which is: the copy of file.ext
in that commit's parent commit. In fact, this repeats for every file in the commit: we would like to compare that commit's version of that file, to the parent's version of that file, to see what changed.
There's a handy script-able ("plumbing") command for this, which is git diff-tree
: given the hash ID of an ordinary non-merge commit, git diff-tree
compares the commit's parent to the commit. Add -p
or --patch
to get a textual difference (this automatically implies the -r
option). Consider using -U0
to drop context lines. You will, of course, still need to parse the output lines, to detect hunk headers and the added/deleted markers.
A simple git diff-tree <hash>
does not, however, work for two cases of commits:
A root commit has no parent. Fortunately, the empty tree comes to the rescue: git diff-tree -p $(git hash-object -t tree /dev/null) $hash
does the trick.
A merge commit has two or more parents. Here git diff-tree
producse a combined diff by default. If that's OK, you can ignore this case. If not, you might consider using --first-parent -m
or just -m
to split the merge and get multiple diffs, against each parent (default) or the first parent (--first-parent
).
That gets you the diff for one commit, so now we move on to the last part.
Now it's time to deal with the hook's stdin input lines
As you read each line, it's your job to:
Check the old and new hashes for the special all-zero-digits null hash. In Python, there are multiple ways to express this; one is:
def is_null(hash):
return all(i == '0' for i in hash)
If the old hash is null, the reference is being created at the new hash. If the new hash is null, the reference used to have the given old hash, and is being deleted. Otherwise—neither hash is null—the reference is being updated: it had the old hash, and will have the new hash.
Figure out what to do, if anything, with the change to the particular reference. Is deletion allowed? Is creation allowed? Does it matter if this is a branch name (starts with refs/heads/
) vs a tag name (starts with refs/tags/
) vs something else entirely?
Creations are especially difficult. The newly introduced name makes the given object reachable by that name. If the object is a tag or commit, that makes additional objects reachable by that name as well. Some or all of these objects may be new. Some or all of these objects may already exist. The classic case is when someone creates a new branch name: it may point to an existing commit, already on some other branch, or it may point to a new commit, the new tip of the new branch, which may have many additional new commits before joining up with some existing branch(es).
Updates are the most common, and usually the simplest to handle. You know that the existing reference name made the old object reachable, and the proposed update is to make the new object reachable. If the reference is a branch name, both objects are in fact commit objects, and it is easy to find which commits, if any, are newly reachable from the proposed new hash, and which commits, if any, are being removed from reachability via the proposed new hash:
git rev-list $old..$new
produces the set of hash IDs that are newly reachable, and:
git rev-list $new..$old
produces the set that are no longer reachable. (Use git rev-list --left-right $old...$new
, with three dots, to get both sets of hash IDs at once, with distinguishing markers. You can use $new...$old
: the symmetric difference that this produces is itself symmetric, except of course that the left and right sides are reversed.)
Assuming you have handled creation somehow, if your goal is to examine newly-reachable commits—whether or not they are new to the repository overall—you can simply walk through all the new commits, testing each one to see if it is a root commit, an ordinary (single-parent) commit, or a merge commit. (Hint: add --parents
to the git rev-list
command to get the parent IDs included, so that you can easily tell how many parents each commit has. Also, consider the graph structure of the commit graph fragment you are walking: $old..$new
may include merges, which may make many commits reachable that may or may not be new to the repository.)
You now have all the commit hashes, and their parent counts. You also know how to use git diff-tree
to compare each commit against its parent(s) or against the empty tree as needed. So now you are ready to write your fancy pre-receive hook.