0

I am writing git smudge filter.

.gitconfig

[filter "smudgey"]
    smudge = smudge_filter

smudge_filter

#!/usr/bin/env bash
# $Id Date: Wed, Mar 25, 2020  1:41:34 PM, User: Joey Gough, Branch: master$

IFS=

log_string="\$Log\nhello world"

changed_data=$(sed s/\$Log[^$]*/"$log_string"/g $1)

echo $changed_data

filtered file

$Log$

Result

When I check out this file, it converts the Log tag and inserts "hello world"

$Log
hello world$

Situation

When I rewrite the .gitconfig to this:

[filter "smudgey"]
    smudge = smudge_filter --smudge %f

It prints out two newlines and that's all.

I have tried so many different approaches and so far it seems as though I cannot access the filename and the file contents at the same time in a Bash script.

Question

How do I access the file contents and the filename at the same time in the git filter? Or can I?

Joey Gough
  • 2,753
  • 2
  • 21
  • 42
  • Incidentally, things like `$Id$` and `$Log$` are actively discouraged in Git, which is why Linus didn't put them in as primitives. Having worked with CVS for years, I got used to them—especially the `$Id$` stuff, which we embedded into binaries as identifiers—but in Git this is the wrong way to go. To put in an identifier, use the commit hash, which you insert when you do the build, and do not store in any committed file. – torek Mar 25 '20 at 21:26
  • @torek, i have been trying to tell people this. When you say "actively discouraged", do you know of any references to official docs or publications where this is discouraged? – Joey Gough Mar 26 '20 at 07:30
  • It's a bit hard to find good original Torvalds quotes, but here's one: http://www.gelato.unsw.edu.au/archives/git/0610/28891.html (link from [this answer](https://stackoverflow.com/a/384112/1256452) to [this question](https://stackoverflow.com/q/384108/1256452)). – torek Mar 26 '20 at 17:43

1 Answers1

1

How do I access the file contents ...

There are no file contents. Or maybe a better way to phrase this is: there are contents. They are not (yet) in any file.

and the filename at the same time ...

You have the method for getting the file's name, via the %f directive.

The important thing to keep in mind is that the file does not yet exist.1 The contents will go into that file after you filter them!

If the sed command does what you want, keep the sed command as it is. If you want to put the file name in somewhere, do that separately.

Here's a smudge filter that replaces fill in the blank with blanks, and inserts the file's name at the top:

#! /bin/sh
# invoked as "smudge %f" from .gitconfig settings
printf "%s\n" "$@"
sed 's/fill in the blank/_________________/'

Here's a different smudge filter that replaces __myname__ with the file's name:

#! /bin/sh
quoted=$(printf "%s" "$@" | sed -e 's,/,\\/,g' -e 's,&,\\&,g')
sed "s/__myname__/$quoted/"

(The quoting trick is to make sure that $quoted does not expand to characters that affect the sed substitute command: forward slash is the delimiter and ampersand would be replaced by the left hand side.)


1Well, the file may or may not exist. It may be empty. In your case it apparently does exist and is mostly or entirely empty. There are various race conditions here as the filtering is part of a pipeline with different processes doing different things.

Note that if you switch to a long-lived filter, the details change, but the overall strategy is the same: the text you will filter is not yet in the target file(s).

torek
  • 448,244
  • 59
  • 642
  • 775
  • thank you @torek, so what is the difference between using sed and printf here? it looks like printf, does not terminate the filer, but sed does? it that correct? how do you avoid race conditions? – Joey Gough Mar 26 '20 at 07:44
  • can I do a `git log $2` in a smudge filter to put the commit history is the log tag? – Joey Gough Mar 26 '20 at 08:04
  • `sed`: reads stdin, writes stdout, performing substitutions as directed by command line arguments. `printf`: prints to stdout, does not read stdin; output is determined by the command line arguments. Normally I'd use `echo` to print to stdout but `echo` has non-portability issues with `-e` vs `-n` on SysV vs BSD based systems and the like. – torek Mar 26 '20 at 17:46
  • You *can* run `git log $2` but: (1) Where did `$2` come from? Well, in your case, your driver was configured as `smudge_filter --smudge %f`, so that's where `$2` came from. If the filter is the smudge filter, though, why bother with `--smudge`? (2) The `git log` command is what Git calls *porcelain* rather than *plumbing*. A *plumbing command* is one that is *designed* to be run from scripts. A plumbing command therefore has easily script-able behavior. It won't change its output based on some user's personal config. [continued] – torek Mar 26 '20 at 17:49
  • A *porcelain* command, like say `git branch` or `git tag` (vs the plumbing `git for-each-ref`), tends to run its output through a pager (which users can set for themselves with `core.pager`), and/or use other Git configuration settings. For instance, `git diff` is porcelain, and reads the user's `diff.renames` and `color.diff` settings, but `git diff-tree` is plumbing and does not read any such settings. – torek Mar 26 '20 at 17:51
  • Since `git log` *is* a porcelain command, it reads users' settings. Unfortunately there's no porcelain option or equivalent to `git log`. What this means is that you must check the `git log` documentation *very thoroughly*, and control for any user-settings that might mess up your script, if you want to run `git log` from a script. Note that the set of settings change from one Git release to another: `log.decorate`, for instance, was added in Git 1.7 or so. – torek Mar 26 '20 at 17:52
  • As for avoiding race conditions, if your filter is well-behaved—i.e., only reads stdin and writes stdout and only uses the provided command line arguments—any remaining races would be some other program's fault. :-) If you read files from the work-tree, or even look up users' Git configuration settings, you may introduce races here: a file you find in the work-tree may have been undergoing changes while you were looking at it, for instance. In some cases, there are some you can't avoid, e.g., if your smudge filter is *supposed* to read `$HOME/.smudge-settings` for each user. – torek Mar 26 '20 at 17:56
  • For these cases, you just document them: "my smudge filter reads your config, so don't change your own config while a `git checkout` is running." – torek Mar 26 '20 at 17:57
  • `--smudge %f` is what is shown in the .gittributes documentation. That is where that came from . i'm just reading the rest of your notes now. Any idea how I could get the commit history of the file during smudge? – Joey Gough Mar 26 '20 at 18:13