11

I am creating some scripts and programs that fetch commit information using

git log --pretty=<my format> -1 <commit>

I wonder if the output of this command is suitable to be parsed by programs (plumbing) or only meant to be presented to humans (porcelain). For example, in some projects I am fetching commit SHA + author name + commit summary with this:

git log --pretty="%H%n%an%n%s" -1 HEAD

And then I split the output string by the newline character (I'm on Linux).

Besides, in some cases I also do something like this:

git log --pretty='[%h] %an: %s' -1 HEAD

And then parse the result with the following regex, expecting that a short SHA, the author name and the commit summary are in the captured groups:

^\[(\w+)\] ([^:]+): (.*)$

Is it a good approach? If not, what is the preferred way to programmatically get information about commits?

iBug
  • 35,554
  • 7
  • 89
  • 134
  • I prefer it is an porcelain command, due to this (off topic) clue: In Pro Git v2, [Chapter 10.1](https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain#_plumbing_porcelain) says "this book’s first nine chapters deal almost exclusively with porcelain commands", and `git log`, with machine-oriented format, are mentioned [in Chapter 2.3](https://git-scm.com/book/en/v2/Git-Basics-Viewing-the-Commit-History), which is in "this book's first nine chapters". – Geno Chen Dec 02 '18 at 17:03

4 Answers4

6

git log is a porcelain command.

It actually performs quite a disparate number of tasks — combining walking the revision graph, git diff and git grep and whatnot.

A plumbing way to do someting like

git log --pretty='[%h] %an: %s' -1 HEAD

is to combine git show-ref with git cat-file and parse the result—something like

git cat-file commit `git show-ref -s HEAD` |
  while read line; do
    # do some processing
  done

Actually the root Git's manual page, git(1)—run git help git to read it—contains the breakdown of commands into porcelain and plumbing layers.

kostix
  • 51,517
  • 14
  • 93
  • 176
  • `git cat-file commit ` seems like a reliable plumbing command, but the output appears to be more difficult to parse than that of `git log --pretty`. Any better solution? – iBug Dec 04 '18 at 15:44
  • 1
    I cannot see what's complex about it: it's a header of lines of the form `^key SP value LF$` separated by a blank line `^LF LF$` from the content. So basically you read all the lines until an empty one and look for specific keywords such as `author` and/or `committer`. I mean, can you elaborate on what particular difficulty you have with it? – kostix Dec 04 '18 at 16:03
  • Hi, can you take a look at my own answer and give some comments? (+1 for your answer) – iBug Dec 22 '18 at 04:13
4

I agree with kostix; git log is a porcelain command. But the problem here is that there are some things git log can do that are too difficult to do with other commands, so we can sometimes make git log act like a plumbing command.

The key distinction between plumbing and porcelain shows up when comparing, e.g., git branch and git tag to git for-each-ref, or git diff to git diff-tree and git diff-files and git diff-index. It's not how many porcelains there are per plumbing. Here, for instance, the plumbing git for-each-ref has two separate porcelain front ends, while the single front-end git diff has three plumbing back-ends. No, the key is that git diff changes its behavior based on user-selected configuration items:

diff.algorithm
diff.dirstat
diff.renameLimit
diff.renames
diff.statGraphWidth
diff.submodule

and so on. The plumbing versions ignore all user configuration, so that a script you write behaves the same for Alice, Bob, Carol, and Dave, even though they have different settings.

When using this definition, we can decide whether git log acts like a plumbing command. This requires enumerating all the git log configuration options. Unfortunately, there's no clean way to do that—more options can be added at any time, and some have been added over time.

Here's a list I found by scraping through the git log and git config manual. Note that I omit all the diff-oriented ones (e.g., color.diff and the diff.* items mentioned above) as there are plumbing commands to handle the equivalent of -p in git log (though you must work through one commit at a time).

color.decorate.<slot>
core.notesRef
format.pretty
i18n.logOutputEncoding
log.abbrevCommit
log.date
log.decorate
log.follow
log.graphColors
log.mailmap
log.showRoot
log.showSignature
notes.displayRef
pretty.<name>

So, let's say we want to get the committer date from some particular commit, formatted some particular way. To do that we might run:

git log --no-walk --pretty=format:%cd

We find in the main git log documentation that pretty format %cd is described this way:

%cd: committer date (format respects --date= option)

We failed to give a --date= option, so git log will look up the log.date setting. That's a user-configuration option, and our git log output will depend on the user's choice, rather than ours.

To make this git log act like a plumbing command, then, we must override the log.date configuration setting, with, e.g., --date=default or -c log.date=default:

git -c log.date=default log --no-walk --pretty=format:%cd

or:

git log --no-walk --date=default --pretty=format:%cd

Ideally, Git should have either a plog command that is defined as plumbing variant of git log, or a git format-log-metadata plumbing command that takes the --pretty=<directives> options and formats log metadata. Since it doesn't, it's up to anyone writing a script, that needs git log --pretty=format:... output, to make sure that they know about configuration options that might affect them.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Does that mean, even if it isn't strictly a plumbing command, I can still override related settings and expect plumbing results? – iBug Dec 04 '18 at 15:47
  • @iBug: Yes. The problem is that in the future, because `git log` is really porcelain, someone might *add a new configuration item* that you can't tell that you must override yet, because right now you don't *need* to override it. – torek Dec 04 '18 at 16:22
  • Hi, can you take a look at my own answer and give some comments? (+1 for your answer) – iBug Dec 22 '18 at 04:13
0

Thanks to kostic and torek for their answers.

Despite what they answered, I believe that some of the pretty format options can be safely treated as plumbing (i.e. safe to be parsed by programs). Examples include

  • %H for full commit SHA
  • %T for full tree SHA
  • %P for full parent SHAs
  • %an, %cn, %ae, %ce, %at, %ct for author/committer name/email/date (Unix). Also RFC 2822 and ISO 8601 style times are reliable %aD, %cD, %aI, %cI
  • %s for commit summary
  • %G? for signature status
  • %n for a newline (lol...)

Yes, while format specifiers like %ad and %cN can be affected by user settings, it's unlikely that the above ones do. So I have decided that my current code that parses the output of git log with a pretty format combined from above specifiers, is safe and not error-prone.

iBug
  • 35,554
  • 7
  • 89
  • 134
  • @jthill How is it different from `git log`? – iBug Dec 22 '18 at 04:24
  • 1
    It's got the core-command aka plumbing guarantees, its output's intended for your use. – jthill Dec 22 '18 at 04:29
  • 1
    It seems pretty safe to use these directives with `--pretty={,t}format` / `--format=`, yes. Curiously, while these mostly do work with `git rev-list`, they're not listed as options for it. But rev-list and log are built from the same source file and mostly handle the same arguments. Unfortunately `git rev-list` behaves poorly when used with `--format`s! – torek Dec 22 '18 at 05:23
  • @torek Can you elaborate a bit about the last sentence? I'm very curious about how `git rev-list` behaves poorly with pretty formats. – iBug Dec 22 '18 at 05:25
  • Just try it: `git log --pretty=format:"%H %cn" HEAD` vs `git rev-list --pretty=format:"%H %cn" HEAD`. Sure, you can throw out every other line, but why should you *have* to? – torek Dec 22 '18 at 05:47
  • @torek There's zero difficulty for me to match the line with `^commit [0-9a-f]{40}$` and then throw it away. – iBug Dec 22 '18 at 06:51
  • 1
    Sure, but again, why should you *have* to? That's like having `git for-each-ref` print the date on every other line, or something. rev-list is a plumbing command; it should behave like one. (Also, if/when Git switches to SHA3-256, that `{40}` will be wrong.) – torek Dec 22 '18 at 17:22
0

Just to add to kostix's answer: for me, git show-ref -s HEAD does not output anything and returns with exit code 1.

Instead of:

git cat-file commit `git show-ref -s HEAD`

I use:

git cat-file commit `git show-ref --head -s HEAD`

or:

git cat-file commit `git rev-parse HEAD`