0

Consider next pseudo-code thats create a required git log:

git checkout master
git checkout -b 123-test-branch-1
git commit -m "#123 b1 c1"
git commit -m "#123 b1 c2"
git push
git checkout master
git checkout -b 456-test-branch-2
git commit -m "#456 b2 c1"
git commit -m "#456 b2 c2"
git push
git checkout 123-test-branch-1
git merge 456-test-branch-2
git commit -m "#123 b1 c3"
git push

In a real-world my update hook in remote git repository validates branch name and commit message formats. Branch name and commit message must contain issue number, for example, in 123-test-branch-1 and #123 b1 c1 the issue number is 123. When branch is pushed, hook extract issue number from branch and commit message and compare it. If they are not equal, hook exits with error.

This works great, when I push branch that has only "own" commits. But, git log example above, pushed branch 123-test-branch-1 has commits from merged branch 456-test-branch-2 so hook try to compare all commits from both branches only with pushed branch 123-test-branch-1 and exits with error because commits from 456-test-branch-2 has issue number 456, when 123 is expected.

To receive commits, I use git log --pretty=%s ${oldRef}..${newRef}, where oldRef and newRef is "update" hook arguments.

So, my question is how to solve this problem. Somehow group commits per branch, or filter commits from branch that pushed now (but if 456-test-branch-2 is local branch and never pushed and never validated, hook may skip invalid commits), or something else.

Alex
  • 571
  • 1
  • 8
  • 26
  • Maybe this help: http://stackoverflow.com/a/7131735/575643 – Idemax Mar 30 '17 at 08:11
  • I guess this is even more helpful: http://stackoverflow.com/a/2707110/575643 – Idemax Mar 30 '17 at 08:12
  • You should also consider the case where, instead of three separate `git push` commands, the user waits and does one `git push origin 123-test-branch-1 456-test-branch-2` at the end. – torek Mar 30 '17 at 08:37
  • @torek, in this case, ``update`` hook will be run twice, for each pushed branch, and for ``123-test-branch-1`` git log will be the same as with three ``git push`` commands, right? If so, the problem is same: commits, merged from ``456-test-branch-2`` will be processed. – Alex Mar 30 '17 at 11:24
  • Correct, the update hook runs once per update. That's why I suggest, at least for merge traversals, using `--first-parent`. You might also want to restrict precisely *who* can do merges and/or when. For something quite fancy (though still flawed), see my sample pre-receive hook [here](http://web.torek.net/torek/git/pre-receive.sh.txt). – torek Mar 30 '17 at 18:27

1 Answers1

2

The update hook does not get enough information: it cannot get a "global view" of the incoming hash IDs. A pre- or post-receive hook does,1 and therefore does get enough information—at least for some purposes.

The biggest problem lies with new branch creation. Suppose, for instance, an update is delivering the names refs/heads/a and refs/heads/b, where both names are new (their old hashes are the null hash), and refs/heads/a points to commit N2 and refs/heads/b points to commit N3 in this graph fragment:

                 N2   <-- A
                /
...--O--O--O--N1
                \
                 N3   <-- B

where all the O commits are "old" (as in, were reachable from existing branch or tag names before) and the N commits are "new", as in were never reachable before, and are therefore listed by:

git rev-list refs/heads/a refs/heads/b --not \
    $(git for-each-ref --format '%(refname) |
        egrep -v '^(refs/heads/a|refs/heads/b)$')

It's clear that these three N commits are "new", but to which branch should you assign N1?

There is no single right answer to this. Commit N1 is on both branches, after all.

In any case, if you are more concerned with merge commits—as in, e.g.:

...O1--O2--N1--N2   <-- A
              /
...-O3--O4--N3    <-- B

—you may want to use --first-parent traversals. Here we can believe, based on these two branch-name updates (A moves from O2 to N2, B moves from O4 to N3)—that the first parent of N2 is N1 (it's possible, but difficult, to make this happen the other way around), so following --first-parents will "assign" commit N1 to A and not to B. Again, if you are doing this from an update hook, rather than a pre- or post-receive hook, that may be the best you can do, since you do not get the information that both A and B are proposed to be updated.


1A post-receive hook is run after dropping all the locks, so it races against other operations that may update reference names. A pre-receive hook gets all the proposed updates and therefore there is a big lock around reference name updates, so it's clearly safer, in some sense, to do this work there.

The drawback is that the pre-receive hook runs while holding a big lock, so anything "slow" it does, prohibits parallelism.

torek
  • 448,244
  • 59
  • 642
  • 775
  • What you can say about ``git reflog show --all | grep ``? This command can show branch in which commit was first appeared. This approach has limitation: only last 90 days available due to git gc by default, but for most cases, this is acceptable solution. So, I can find branches for all commits, intercepted by hook (``update`` or ``pre-receive``), and run validation for each branch. – Alex Mar 30 '17 at 11:21
  • The reflog technique has three flaws: (1) it only works for 90 days or other reflog expiration, as you note; (2) It only works if reflogs are *enabled* (they are optional); and (3) most important for a hook, it only works *on the machine on which the branch is created*, which is not the server receiving the push (on server bare repositories reflogs are disabled by default as well). – torek Mar 30 '17 at 18:25
  • After few experiments with ``git reflog`` on remote repo I can state that reflog-based solution is not working. Can you explain more detail, how I can find branches using your approach with ``git rev-list``? I need to find branches for all received (in hook) commits and know which commit from which branch for proper validation. – Alex Apr 03 '17 at 06:27
  • 1
    The fact is that you *cannot* solve this in general; the best you can do is handle specific cases that you decide are then "good enough". See http://stackoverflow.com/a/3162929/1256452 for more. – torek Apr 03 '17 at 06:45
  • Can we also abreviate with: `git rev-list refs/heads/a refs/heads/b --not --exclude="refs/heads/a" --exclude="refs/heads/b" --all` Anybod know if that works? according to the doc it should ... – Gabriel Jun 25 '20 at 15:49
  • @Gabriel: I think that probably works, but I think `--exclude` isn't in some very old versions of Git. – torek Jun 25 '20 at 15:59