Is git able to take the commit message directly from source file?

Question

I'm looking for a way to extract the git commit massage directly from the committed source file without invoking the editor or similar.

Our department has just started working with git, and for legacy reasons the changes made are written at the top of the source file:

#!/usr/local/bin/php
<?php
//
// @(#)insert_AX_serialno.php   Arrow/ECS/EMEA/EA/nb    1.6     2018-03-14, 13:41:20 CET
//
// V 1.6:       Now also check the Warehouse Code of an item before inserting the serial
// 2018-03-07   number. The Warehouse Code must not be equal to any of three values (see below).
//
// V 1.5:       Now also check the Storage Dimensiaon of an item before inserting the serial
// 2018-03-07   number. The Storage Dimension must be equal to the constant "PHYSICAL".
//
// V 1.4:       introduced an "Environment" variable which determines the target of the GetAXPO...
// 2018-02-21   functions (DEV, UAT or PROD). The variable can either be set explicitly or gets
//              its value from the $_ENV array.
//
// V 1.3:       stop processing if a line does not have the necessary Approval Status and
// 2018-02-20   PO Status
//
// V 1.2:       Every insert requires now a RECID; either for the line or for the header.
// 2017-12-20   So we're selecting the RECID from the AX table if it's not provided as

Now I would like to take the commit message directly from the source code instead of typing it again, e.g. the the commit message should (in this example) read as "V 1.6 - 2018-03-07 Now also check the Warehouse Code of an item before inserting the serial number. The Warehouse Code must not be equal to any of three values (see below)."

I'm new to git, and all I could excerpt from the githooks man page was that I can prepare the message with a hook, but not replace it.

My idea is that I can commit a file with git commit <filename> and git fetches the relevant message from the source file ...

The question is:
1) Does a hook know which file(s) is/are being committed? If yes, is it a parameter to the hook or an environment variable?
2) Can a hook prepare a message file out of the source file and make git use that file instead of opening the editor (of course without using the "-m" parameter)?

Possible duplicate of [Preparing a git commit messaging before committing?](https://stackoverflow.com/questions/20438293/preparing-a-git-commit-messaging-before-committing) — Xavier Guihot, Mar 15 '18 at 08:57
Git commits can encompass multiple files and are not necessarily unique per file. As @XavierGuihot mentioned, it is possible to prepare a commit ahead of time if this is what you're looking for. However, if you just want a faster way to write commits, the -m flag allows you to type the commit message in the terminal (ex. git commit -m commit message). — Ryan Stonebraker, Mar 15 '18 at 09:01
If it is important to maintain the timestamps of the commits, [you can make a git commit in the past](https://stackoverflow.com/questions/3895453/how-do-i-make-a-git-commit-in-the-past) — Ryan Stonebraker, Mar 15 '18 at 09:04
No, the question mentioned above deals also with manual editing a separate message file; but I want to avoid a _separate_ message since the relevant message is already part of the source file. — Bernhard Niessl, Mar 15 '18 at 09:05
Yes, they are part of my source files. And the question is simply whether it's possible to use them as git messages. — Bernhard Niessl, Mar 15 '18 at 09:52

score 1 · Answer 1 · answered Mar 15 '18 at 16:27

all I could excerpt from the githooks man page was that I can prepare the message with a hook, but not replace it.

You can prepare the message in any way including replacing it completely.

1) Does a hook know which file(s) is/are being committed?

No, but you can query git itself. The files for a new commit are in the index. List the files with command git diff --name-only.

2) Can a hook prepare a message file out of the source file

No, but you can write your own script for that.

and make git use that file instead of opening the editor (of course without using the "-m" parameter)?

No. When git executes prepare-commit-msg hook the next step is always opening editor.

You can prevent opening editor by using explicit option git commit --no-edit. Or you can prepare a file with commit message before committing and not use prepare-commit-msg hook at all but call git commit -F message.txt.

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

phd wrote an answer while I was AFK and you should look at it too, but I wanted to finish this one, so:

I'm new to git, and all I could excerpt from the githooks man page was that I can prepare the message with a hook, but not replace it.

That's not the case—a prepare-commit-msg hook can do anything it likes with the message file, including replace its content entirely. However, you're probably conflating the message file, which is typically just .git/COMMIT_EDITMSG, with what git log shows later, which is not .git/COMMIT_EDITMSG.

To understand what's going on (and therefore what you need to do), you need to understand what Git actually puts in a commit and thus how commits work.

First, each commit you make contains, at least logically,¹ a complete, independent snapshot, separate from every other commit. That is, there is some source code tree-of-files-and-directories found by starting from some top level directory and enumerating the files and directories within it.² Git commits all the files, including the ones in sub-directories.³

Hence, if you have a Git repository, you can run:

git log

to see various commits, and then select one by hash ID (cut and paste with mouse for instance) and run:

git ls-tree -r <hash-id>

and you will see that that particular commit contains every file, not just files that differ from the previous commit.

Nonetheless, git show <hash-id> will show you what changed in that commit, as if the commit stored only the changes. The commit doesn't store changes—it stores everything whole and intact—and yet git show shows changes. The way git show achieves this is by comparing the commit to its predecessor commit.

The predecessor of a commit is the commit's parent. The commit is thus the child of that parent. For each file, if the file in the parent commit matches the file in the child commit, git show says nothing about the file. If the file does not match, git show produces a set of instructions for changing the parent version to make it become the child version. Git produces this difference listing *at the time of the git show operation, which means that you can pass various flags to git show to change how it computes and presents the difference.

Let's take a look at an actual, raw commit object from the Git repository for Git, just to make this concrete:

$ git rev-parse HEAD
e3a80781f5932f5fea12a49eb06f3ade4ed8945c
$ git cat-file -p e3a80781f5932f5fea12a49eb06f3ade4ed8945c | sed 's/@/ /'
tree 8e229ef2136e53a530ef74802f83d3b29a225439
parent 66023bbd78fe93c4704b3df754f9f7dc619ebaad
author Junio C Hamano <gitster pobox.com> 1519245935 -0800
committer Junio C Hamano <gitster pobox.com> 1519245935 -0800

Fourth batch for 2.17

The log message for this commit is that last line. It's in the commit object, the one with hash ID e3a80781f5932f5fea12a49eb06f3ade4ed8945c. If I run git show on that commit, Git will tell me about Documentation/RelNotes/2.17.0.txt, but in fact, the files in the commit are those in tree 8e229ef2136e53a530ef74802f83d3b29a225439. If I run git ls-tree -r 8e229ef2136e53a530ef74802f83d3b29a225439, it produces 3222 lines of output:

$ git ls-tree -r 8e229ef2136e53a530ef74802f83d3b29a225439 | wc
    3222   12900  259436

so there are over three thousand files in the commit. 3221 of those files are 100% identical to the versions in the parent, which is 66023bbd78fe93c4704b3df754f9f7dc619ebaad, which also has 3222 files in it.

Anyway, the critical bits here are:

Commits are Git objects: one of four types. The complete set adds tree, blob (file-data only: the file's name, if there is one, is in a tree object instead), and annotated-tag. The last one is irrelevant here.
Each commit has some set of parent commits (usually just one).
Each commit saves a tree. That tree lists the file names and their blob hash IDs. You can experiment with git ls-tree (and read its documentation) to see how they work but at this level the details are irrelevant.
Each commit also has its associated but user-supplied metadata: author and committer (name, email, and timestamp), and the log message copied from the message file that your hook can edit.

Making a commit is therefore a process that involves building the tree object to use as a snapshot, and then adding the metadata so as to make a new commit. The new commit gets a new, unique hash ID. (The tree ID is not necessarily unique: if you make a new commit that has the exact same tree as some previous commit, which is a sensible thing to do sometimes, you wind up re-using the old tree.)

¹Eventually, Git does get around to doing the same kind of delta-compression as other version control systems. But this happens long after the commit has made a complete independent snapshot.

²This is an approximation. See the next section for more detail.

³Git does not save any of the directories: it commits only files. The existence of some directory is implied by having a file within it. Git will re-create the directory later if needed, when checking out the commit and discovering that it must do so in order to put a file there.

How Git makes commits, or, what goes in a tree object

You mention specifically that you are running git commit filename:

My idea is that I can commit a file with git commit and git fetches the relevant message from the source file ...

Git doesn't build the tree from arguments passed to git commit.

Instead, Git has a single thing⁴ that it calls an index, a staging area, and a cache, depending on who is doing the calling and what aspect of the index they wish to emphasize. This index is the source for the tree object.

What this means is that the index initially contains all the files from the current commit. When you run git add path, Git copies the file from path in the work-tree into the index, overwriting the one that was there before.

To make a tree for a commit, Git typically just invokes git write-tree, which simply packages up the index contents as a tree. If this tree is the same as some existing tree, you re-use the old tree; if it's new, it's new; either way it's the tree, made from whatever is in the index.

Once the tree is written, Git can combine it with the current commit's hash ID to get the tree and parent lines for the commit object. Git adds your identity and the current time as author and committer, your log message as the log message, and writes out the new commit. Last, Git writes the new commit's ID into the current branch name, so that the new commit is the new tip of the branch.

When you use git commit path, however, things change here. Now the details depend on whether you run git commit --only path or git commit --include path. Git is still going to build the tree from an index, though.

⁴In fact, there's one index per work-tree. By default, though, there's only one work-tree. But there are also temporary indices, as we'll see in a moment.

`git commit path` and temporary indices

When you run git commit path, Git must build a temporary index, separate and apart from the normal index. It starts by copying something. What it copies depends on --only vs --include.

With --only, Git creates the temporary index by reading the contents of the current commit, i.e., the HEAD commit, rather than by reading the contents of the normal index. With --include, Git creates the temporary index by reading the contents of the normal index.

In the temporary index, Git then replaces any entry for the given path with one made from the version of the file in the work-tree. If the path isn't in the temporary index, Git adds it as a new file. Either way this path is now in the temporary index.

Git now makes a new commit while using the temporary index instead of the regular index. The new commit goes into the repository as usual, updating the current branch name so that the branch's tip commit is the new commit. The new commit's parent is the old tip commit as usual. But now that the commit is done, Git has a bit of a dilemma.

The index—the index, the normal one—is normally supposed to match the current commit, at the start of the "work on the work-tree" cycle. The temporary index does match the new commit, because the new commit was made using the temporary index. But the temporary index is almost certainly different in some way from the index. The next action therefore depends once again on --include vs --only:

If you used --include, the temporary index started from the normal index. The temporary index matches the new commit. So the temporary index becomes the real index.

This action mirrors normal commits: Git uses a temporary lock file, named .git/index.lock, to make sure that nothing changes while doing all the commit work. For a normal commit without path arguments, the temporary lock file and the real index have the same content except for certain time stamps, so Git just renames the lock file to the index file path name, and it's all done. So this handles both the no-path-arguments case and the --include with path arguments case.
If you used --only, Git updates the normal index with the entries it copied into the temporary index, leaving the rest of the normal index's entries alone. That way, the files you specifically committed are in the current (normal) index in the same form as they have in the current commit. All other files in the current (normal) index are as they were before you ran git commit: they still match, or don't match, the HEAD commit (whose other entries, for files not given on the command line, all match the parent commit), and they still match, or don't match, the files in the work-tree, none of which were changed by all of this.

What all this means for your prepare-commit-msg hook

As with everything in Git, you must dynamically discover what changed.

You should not look at the work-tree at all. You may have been invoked via git commit (with no path name arguments) in which case the index being used will be the normal index. You may have been invoked via git commit --include or git commit --only, in which case the index being used will be a temporary index.

To find out which file(s) are different between the index—whichever index is the one being used—and the HEAD commit, use one of the difference engines that Git provides.

In general, in any code you write that's meant for users other than just yourself, you should use what Git calls plumbing commands. In this case the command needed is git diff-index. See also Which are the plumbing and porcelain commands?

Using git diff-index -r HEAD will compare the current commit to whatever is in whichever index file is the current one, as determined by $GIT_INDEX_FILE and any alternate work-tree situations due to git worktree add. Conveniently, there's nothing you need to do here to adjust for this. But if the user invoked git commit --amend, you really should compare against the current commit's parent(s). There is no good way to find out if this is the case.⁵

The output from git diff-index defaults to stuff that looks like this:

:100644 100644 f5debcd2b4f05c50d5e70efc95d10d95ca6372cd e736da45f71a37b46d5d46056b74070f0f3d488a M      wt-status.c

You can trim off most of the non-interesting bits here using --name-status, which produces instead:

$ git diff-index -r --name-status HEAD
M       wt-status.c

Note that the separator after the status letter is a tab, but if you write a shell loop of the form:

git diff-index -r --name-status HEAD | while read status path; do ...

you are probably OK in general. To make this really robust, test with funny path names including white space and glob characters. Scripts in bash or other clever languages can use the -z flag to encode things more sanely. See the documentation for more details.

Note that files may be Added or Deleted here, not just Modified. Using git diff-index will insulate you from checking for Renamed; using git diff won't, because that reads the user's configuration, which may set diff.renames. You should also be prepared to handle Type-change in case someone replaced a symbolic link with a file, or vice versa.

Once you have a list of modified files, or interleaved with obtaining the list if you like (but this is more complex—you'll want to keep and use the :<mode> stuff for robust line-by-line decoding), you can inspect the actual diff. For instance:

$ git diff-index --cached -p HEAD -- wt-status.c
diff --git a/wt-status.c b/wt-status.c
index f5debcd2b..e736da45f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1,3 +1,4 @@
+
 #include "cache.h"
 #include "wt-status.h"
 #include "object.h"

shows that I simply added a blank line at the top of the file here. (You need --cached to make Git look at the blob content from the index, rather than looking at the work-tree file. You don't need --cached with the initial -r --name-status variant, although it is harmless to include it. This is an annoying feature of git diff-index.)

After gathering all the git diff-index output and parsing it to discover your log message text, you will be ready to write a new commit log message to the log message file.

⁵There probably should be. This is something of a theme with Git commit hooks: they don't provide enough information. Later versions of Git may add more arguments to the hook, or set specific environment variables. You can dig around in process trees to try to find the git commit command that invoked your hook, and then look at their /proc entries or ps output to find their arguments, for instance, but this is quite ugly and error-prone, and unlikely to work on Windows.

Is git able to take the commit message directly from source file?

2 Answers2

How Git makes commits, or, what goes in a tree object

git commit path and temporary indices

What all this means for your prepare-commit-msg hook

`git commit path` and temporary indices