What is the optimal way to only sync certain file extensions and exclude other file extensions between separate git branches?

Question

Given 3 branches say master, b1, and b2. The master branch only cares about *.txt files. It needs to ignore everything else. Branch b1 only needs what is included by master and say *.h, *.c, *.cpp files, and ignore everything else. Branch b2 also only needs to include the ones included by master and say *.jpg, *.png, *.html, *.css etc. ignoring everything else.

In short, master branch contains information only common to all branches. Example use case: Branch b1 is used to generate output files to be consumed by branch b2, but both contain some information shared with master.

So, what is the optimal way to sync only those common files between master, b1, and b2, and have each branch to only include certain file extensions in its branch and ignore everything else it doesn't need?

I also looked at alternatives of having separate git repos or submodules or subtrees, but the directory structures or nesting patterns created little difficulties. Is there a better way to solve this problem?

score 1 · Accepted Answer · answered Oct 11 '20 at 16:15

Let me start with this because it's perhaps more useful:

Is there a better way to solve this problem?

You could, in theory, not bother with a master branch at all. Have three branches, none of which holds final-assembled-results. Do the assembly outside of Git. If desired, make an "orphan branch" (or use a tag) to record the assembled result, or just keep the assembled results in a completely different repository. But these all result in those sorts of little difficulties you mention.

What goes wrong

Git simply doesn't work the way you want: you cannot (usefully anyway) "care about" some files and have other "uncared-about" files with the "caring" switching around based on the branch. That's because "branches", in the sense you're using the word, do not exist.

Now, that's a strong statement and it needs justification. Clearly, branches do exist. The problem lies in the meaning of the word branch. It has too many meanings and people just sort of flip between them, without realizing that they're doing this, and that gets them into trouble. (See also What exactly do we mean by "branch"?) So let's just avoid the term by using what Git really uses: commit hash IDs.

When you run:

git checkout br2

you're telling Git to do two things:

save the name br2 for future use;
turn the name br2 into a commit hash ID, and extract—which includes "caring about"—a snapshot of all files from that commit.

The second step is the one that really matters right now: the first one is only needed later, when you run git commit to make a new commit, or some other Git command that needs the name (git branch or git status or git rebase, for instance).

With one exception—which you see in a fresh clone that hasn't yet run git checkout—Git always has some commit checked out right now. Your git checkout tells Git: sweep away the one we have right now, and get me some other commit as the checked-out commit.

Let's say that right now, you have br1 checked out, which is commit b100 right now. Later, the name br1 may mean some other commit, but right now it means that one. You run git checkout br2, which tells Git to switch from commit b100 to commit b200 as that's the one that the name br2 means right now.¹

OK, no big deal yet, right? We're moving from commit b100 to commit b200. Commit b100 has in it the *.h files and omits the *.jpg files entirely. So Git "cares about" the *.h files while we have b100 out. Those files are tracked, which means they're in the (single) index. We're moving off b100 though, to b200, which has the *.jpg files and omits the *.h files. Git has to copy the *.jpg files into its index and remove the *.h files from its index, which means it has to remove the *.h files from your work-tree too.

So far, this is all going great: you get just what you want. But now you want to get to master and assemble the pieces. The name master means some other commit, maybe a123 at the moment.

No matter how you get to master, from br1 (b100 at the moment) or br2 (b200) at the moment, you don't have all the *.h and *.jpg files. You can only get one set or another. The underlying problem here is that the "caring about" happens because the files are in Git's index. Listing files in a .gitignore file, which is what you do to keep them from getting into Git's index, only helps if they're not already there—and when you switch to a commit that has the files, Git will put them into Git's index, regardless of what's in a .gitignore file. When you switch to a commit that omits the files, Git will remove them from Git's index, regardless of what's in a .gitignore file.

The index's contents reflect the commit you check out. Each commit has a full snapshot of every file that's in that commit. That snapshot winds up in Git's index. Unless you change them—with git add, or git rm, or by doing another git checkout that replaces them wholesale, for instance—those are the files that will go into the next commit.

Last, when you use git merge to combine work, Git:

finds a merge base commit;
compares the two branch tip commits against this merge base; and
uses that to figure out what to put into the new commit.

The new commit, like any commit, has a snapshot of all the files: all the files that were in Git's index at the time git merge made the merge commit, and those files are the result of the combining process above. Merge commits are the same as any other commit: they have a snapshot and metadata. The only thing that makes them special—makes them merge commits—is that they have two (or more) parent commit hash IDs listed in their metadata.

These interlocking behaviors get in the way: Either master actually does have all the files, in which case, the other commits found by other branch names also need to have all the files, or master doesn't have any of the files, in which case the other branches can be exclusive like this but you can't merge them back into master, because the common commit that Git will find, that will act as the merge base, will cause them to add the files to the new commit that goes into master—and now master has all the files! If you remove them as you go back into the branches, merging will remove the files this time.

Ultimately, Git is all about commits. It's the commits that determine, well, everything! The commits are snapshots-plus-metadata. All a branch name does is find one particular commit: the last one on some chain. Commits can be reached from more than one branch name, and many, or most, commits are on multiple branches simultaneously. So the name has nothing to do with which files are in the commit: it literally can't when more than one name finds that commit.

¹Branch name to commit hash ID mappings change, which is how branches grow in Git. Git is built to add new commits, so the normal way that a name changes is that it now means a newer commit that leads, via the commit graph, back to the old commit—and many more commits too. See also Think Like (a) Git.

What is the optimal way to only sync certain file extensions and exclude other file extensions between separate git branches?

1 Answers1

What goes wrong