0

Inspired by this answer, I wonder if there is a way to obtain the same behavior in Mercurial than the one obtained with the smudge/clean filters specified in the .gitattributes file for git. This is, applying some preprocessing to some files before committing, without affecting the working copy.

You can find a proper description of what I mean in the git documentation on gitattributes in the filters subsection. Also, from the Pro Git book:

It turns out that you can write your own filters for doing substitutions in files on commit/checkout. These are called “clean” and “smudge” filters. In the .gitattributes file, you can set a filter for particular paths and then set up scripts that will process files just before they’re checked out (“smudge”, see Figure 8-2) and just before they’re staged (“clean”, see Figure 8-3). These filters can be set to do all sorts of fun things.

My use case is similar to the one stated in this other question: to clean up part of some files before committing them to the repository but without affecting the working copy.

The most similar thing I was able to find is the encode/decode functionality of Mercurial. The problem is that the documentation on this feature is quite succinct (I couldn't find much information anywhere else).

But then, the encode/decode functionality is marked as an unloved feature. Why is that? Does it mean there is a better way to do what it does? For some reason there is no proper way to do it but I should go for this one like everyone else?

Zoe
  • 27,060
  • 21
  • 118
  • 148
mgab
  • 3,147
  • 4
  • 19
  • 30

1 Answers1

2

Looking at your use case, the intended way to overlay local modifications over a repository is generally using the MQ extension, which allows you to apply patches locally that don't get pushed to a remote repository and can be applied and unapplied as needed (and can themselves be put under version control).

In general, automated modification of files upon checkin or checkout is problematic:

  1. It may not interact well with the rest of your VCS-related tooling, especially those parts that expose patches as diffs or that rename files.
  2. It is generally error-prone; you're checking in a version that you never tested and have to be sure that the encode/decode steps properly roundtrip.
  3. The encoding and decoding setup is not actually part of the repository, but of your VCS configuration. This may lead, for example, to you accidentally pushing passwords because you forgot to set up the configuration correctly in a new checkout. In particular, a fresh hg clone does not copy .hg/hgrc over and may thus checkout undecoded files.

The larger problem that you have when you are using a VCS to handle both permanent and temporary artifacts is that you are trying to make it do something that it isn't designed for. What you are missing is a build or deployment step that creates the temporary artifacts from permanent ones, possibly in conjunction with a local configuration (say, via a template system). This can also be combined with a hook that prevents the accidental checkin of temporary artifacts.

That said, if you absolutely want to use filters, it works as follows: you need matching [encode] and [decode] sections. Each section has a series of pattern = shell-command entries, where pattern describes a filename or set of filename and shell-command is a shell command that transforms an input file into an output file. This command can be prefixed either by pipe: (which is the default) and then has to convert standard input into standard output or tempfile:, in which case the command transforms the files given on the command line (specified by the placeholders INFILE and OUTFILE).

Examples:

[encode]
secretfile = pipe: sed -e 's/FOO/BAR/g'
[decode]
secretfile = pipe: sed -e 's/BAR/FOO/g'

With tempfile:

[encode]
secretfile = tempfile: sed -e 's/FOO/BAR/g' <INFILE >OUTFILE
[decode]
secretfile = tempfile: sed -e 's/BAR/FOO/g' <INFILE >OUTFILE

Both examples convert occurrences of FOO into BAR upon checkin and BAR into FOO upon checkout. Note how this does not actually roundtrip properly: If a file contains the string BAR upon checkin, it will become FOO upon checkout. It can be fairly tricky to write filters that do this correctly in all cases. This is one of the reasons why a separate build step is almost always better than squeezing extra magic into checkins and checkouts.

Reimer Behrends
  • 8,600
  • 15
  • 19