Branch specific content in one file

Question

I am trying to have a file that is specific to each branch. I don't want this file to be overwritten or update at merges. Why does this not work?

(my attempt is based on How to prevent tracked config files from being changed by merges in git? but for some reason it doesn't work. I also followed the more detailed blog post on which that answer was based and it also didn't behave as described in the blog post. Thus, this seems to be a git version issue. I am using 2.7.4)

git init
echo "master">config
echo "config merge=alwaysours">.gitattributes
echo ".gitattributes merge=alwaysours">>.gitattributes
git config --local merge.alwaysours.driver true
git add -A
git commit -m 'Master'
git checkout -b feature
echo "feature">config
touch feature
echo ".gitattributes merge=alwaysours">.gitattributes
git add -A
git commit -m 'Add feature'
git checkout master
git merge feature 
cat config # PRINTS OUT feature INSTEAD OF master

https://stackoverflow.com/search?q=%5Bgit%5D+prevent+merge+config+file — phd, Dec 17 '18 at 16:58
@phd My attempt is based on https://stackoverflow.com/questions/30116096/how-to-prevent-tracked-config-files-from-being-changed-by-merges-in-git but it doesn't work — Bananach, Dec 17 '18 at 17:07
The merge driver approach doesn't work because Git doesn't invoke the merge driver. (Under some conditions only, but, those conditions are extremely common.) — torek, Dec 17 '18 at 18:14
@torek do you know what would work then? Given that the linked answer is outdated or wrong, this should be helpful for a lot of future visitors — Bananach, Dec 17 '18 at 20:04
@phd would you mind boiling this down to how it can be applied to my situation? — Bananach, Dec 17 '18 at 20:04
@phd I didn't really understand that caveat. It seems like it contradicts the rest of that answer since .gitattributes is overwritten? Unless the overwrite should only happen at the feature branch. I tried that (see the edited question) but it doesn't help. Also I was discouraged by the fact that I get different behavior than the mentioned blog post; so I just felt the answer was wrong (or outdated) as well. If you think there is a simple solution, could you please just post it? — Bananach, Dec 18 '18 at 07:14
My strategy is not store configs in git. I store templates and put configs in `.gitignore`. — phd, Dec 18 '18 at 13:20
But I want the particular configuration file associated with that specific branch every time I check out that specific branch. It is not a that is associated with me (like a personal configuration of shortcuts, colors, etc) or with the computer (like path configurations) but with the specific branch. Anyway, does that mean you do not think anymore that **The huge caveat** solves my problem? — Bananach, Dec 18 '18 at 13:23

torek · Answer 1 · 2018-12-18T09:43:33.070

The entire merge driver idea is doomed for reasons outlined in my answer to Git merge strategy for a specific file depending on rebase / merge.¹ There's a different possibility that can be made to work, but it's ugly. In fact, in the end, it's horribly ugly, and probably still a bad idea. Your best bet is probably, instead, to use Git hooks (post-checkout and post-merge, specifically) to manipulate some untracked and ignored file in the work-tree instead.

(Note, however, that since I don't know what you really want to have in your files, I do not even have a good starting point for proposing these as a solution.)

Discussion

It's worth remembering here, before we even start with the idea of having a file whose contents are handled specially, just how Git works with files. Files, in Git, aren't really all that important. What matters in Git is the commit. A commit stores files, so files come along for the ride, but it's the commit itself that is the key—and the way a commit stores files is a little peculiar, which begins to matter at this point.

The way a commit stores files is by building, and then referring-to, a tree object. A tree object is essentially a list of <mode, name, hash> tuples:

$ git ls-tree HEAD
[lots of snippages here]
100644 blob acf853e0299463a12212e9ed5f35d7f4a9d289af    .gitattributes
040000 tree 7ba15927519648dbc42b15e61739cbf5aeebf48b    .github
100644 blob 0d77ea5894274c43c4b348c8b52b8e665a1a339e    .gitignore
...
100755 blob 54cbfecc5ab0531513ff9e069be55d74339ad427    git-bisect.sh
100644 blob 09b0102cae8c8c0e39dc239003ca599a896730cf    git-compat-util.h
100755 blob d13f02da95f3b9b3921c3ccff9e3b6a7511cd666    git-cvsexportcommit.perl
...
100644 blob 2d41fffd4c618b5d7b816146d9df684b195535e3    xdiff-interface.h
040000 tree 77abde3699bc6874e10f1c17f4b97c219492542f    xdiff
100644 blob d594cba3fc9d82d94b9277e886f2bee265e552f6    zlib.c

The string in the middle here (blob or tree) is derived from the mode at the front: 100644 or 100755 is a blob, 040000 is a tree, and there are a bunch of less-common special cases.

The file isn't quite stored in Git. Instead, the file's contents appear in the blob object at the listed hash ID. We can see that blob object directly:

$ git cat-file -p 54cbfecc5ab0531513ff9e069be55d74339ad427
#!/bin/sh

USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|view|replay|log|run]'
LONG_USAGE='git bisect help
    print this long help message.
... [lots more, snipped]

The git cat-file -p command extracts the object, taking out of Git's frozen internal compressed format and turning it into readable text. So the blob object has the contents of the git bisect shell script, and the tree object tells Git that in this particular commit, the blob object should be expanded into useful text form, in the work-tree, under the name git-bisect.sh.

It's this expanding into useful text form process where we can make something interesting happen. We can do this with a .gitattributes filter driver, rather than a merge driver. The merge driver isn't used in critical cases, where we would want it used. The filter driver is always used when extracting a file into the work-tree.

¹If you read through the linked question's answer and reason out what's going on, you will see that it would be possible for Git to make this approach work, perhaps by having another per-file attribute such as always-merge. But Git doesn't have this today, at least.

Filter drivers

Filter drivers come in two forms, which Git calls smudge filters and clean filters. These operate right at the interface between work-tree, where your files have a useful-to-you-and-the-computer format, and the index, which is where Git stores the file's name and the hash ID of a compressed, ready-to-go snapshot of that file (always ready for the next commit, but initially the same as the current commit).

The purpose of a smudge filter is to take the de-compressed, but not yet ready-for-use, text of a file and convert it to ready-to-use, work-tree form. The purpose of a clean filter is to take the work-tree form of a file and remove any work-tree-specific data, so that the file is ready to compressed into the Git-only internal form. The git checkout command—along with a few other commands that can get frozen Git-only objects out—uses the smudge filter. The git add command uses the clean filter to strip out "dirty stuff" that the smudge filter put in.

So, now we can see how we could make the work-tree copy of some file depend on the current branch: we just write a smudge filter that does this. We probably should also write a clean filter that takes out the branch-specific stuff, which will let Git compress the file better, but I'll leave that to you.

To define a smudge filter, we need an entry in some .gitconfig or .git/config configuration file. For instance, if we want to run source code through some sor of pretty-printing filter:

[filter "pretty-printer-for-XYZ-language"]
    smudge = xyz prettyprint --stdin

(assuming the command that pretty-prints a source file is xyz prettyprint and that it needs --stdin to read from standard input). Then we tell Git, through .gitattributes, to apply this filter to *.xyz files:

*.xyz  filter=pretty-printer-for-XYZ-language

The filter needs only read stdin and write stdout: Git arranges for the filter's stdin to come from the uncompressed but "clean" file's content as it appears in the blob object, with the filter's stdout going to the temporary file that will, at the end of this process, become the appropriate file in the work-tree.

For instance, if somefile.xyz in the tree object has some blob hash, Git will read the blob, write the contents into the filter's stdin, read the filter's stdout, and write those contents to somefile.xyz. There are a few important things to realize here though:

The filter has no direct access to the name somefile.xyz. You can tell Git to produce the name as an argument, via a %f directive, but remember that the filter must still read stdin and write stdout. (If you rewrite your filter as a "long running filter process" for efficiency, the filter must obey the packet protocol described in the documentation, which also provides the file's path-name.)
Smudge filters run before git checkout updates HEAD. As with point 1, smudge filters have no direct access to what's going on: they don't know that Git is in the middle of a git checkout otherbranch, for instance.

Point 2 here is in bold because it's the biggest stumbling block here. It's possible to use the current process tree to find the Git command that invoked the filter, and use whatever OS facilities there are to find the command-line arguments. It would be very helpful if Git set up an environment variable before starting such filters, indicating what's happening: is this filter being run on behalf of a switch to new branch operation, or is it being run due to a git checkout -- path/to/file or git checkout --ours -- path/to/file index extraction, for instance? But, alas, Git doesn't do that either.

Okay, I guess I'm not good enough at git to make this all work for me — Bananach, Dec 18 '18 at 07:17
@Bananach: Git definitely makes this painful. The merge driver doesn't work; the filter driver works, but is clumsy. The post-checkout and post-merge hooks really do give you your best shot: you can create or manipulate any untracked files you like here, and these hooks run after the merge or checkout finishes, so you can look at the current branch. — torek, Dec 18 '18 at 07:34
Not to be ungrateful, but if you think post-checkout and post-merge hooks are the best option, why did you write an answer about merge and filter drivers? — Bananach, Dec 18 '18 at 08:28
Because your question (last line of third paragraph) was about why the approach here didn't work. — torek, Dec 18 '18 at 09:11
but the only part of your answer that refers to this is "The merge driver isn't used in critical cases, where we would want it used". Can you specify what exactly this means for the few lines of my code? Also, does that mean that the answer that I took this from is flawed? Or did I implement that approach incorrectly? — Bananach, Dec 18 '18 at 09:33
Yes, the answer from which you got your code is fundamentally flawed. Your implementation is OK, it just ran into the problem. Rather than repeating that particular explanation, I linked to similar question with an answer as to what the fundamental flaw is there. The filter driver idea is similar to the merge driver idea, but also flawed, and it's good to know what the flaws are and when they will strike, so that you can avoid that situation if it applies. But I don't know what your ultimate goal is (see [The XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem)). — torek, Dec 18 '18 at 09:41
It's fine, I see the point of your discussion, my understanding of git is just so brittle that it adds up to the list of question marks instead of reducing them. My goal is really exactly very similar to the MWE. I have a number of files that need to look different in a feature branch, because they basically tell the involved programs to be verbose, which helps with debugging — Bananach, Dec 18 '18 at 09:51
The usual recommendation is to put controls like that (verbose configuration, for instance) in an untracked file, so that it's never part of the source. In your particular case, you might want to have a post-checkout hook that says: "if the checkout was for a branch, and the branch's name matches a pattern like `*DEBUG*`, make sure `verbose=true` is set in untracked conf file" for instance. — torek, Dec 18 '18 at 17:54
On another site I would have just asked about this goal to begin with, to avoid the XY problem. But on SE, I felt such a question would even more likely be shutdown immediately, given the abundance of similar questions, so I tried to ask a question about a specific problem with a MWE. Not that that helped in avoiding the first response to be a close vote... Thanks for your opinions, when I have more time I'll look into to hooks — Bananach, Dec 18 '18 at 23:31

Branch specific content in one file

1 Answers1

Discussion

Filter drivers

Linked