0

hoping someone could help me.

So I am thinking moving from a bespoke codebase repo control to GIT.

The biggest issue I think I will find is that all okay my code uses a common dictionary file which does get updated overtime as well.

Currently in our bespoke repo control when you take a branch of a product it will also copy into your branch a copy of the latest dictionary.

The issue i see with GIT is a can't work out how to do the same thing in GIT. I have looked and seen there is submodules but that is no good at it will put the dictionary in a sub directory which will mean i will need to re-code all of our product to find where the dictionary will be.

I was hoping someone might know of a solution, so that when a branch is made a file is always copied into the branch that isn't a part of that particular repo. (P.s Also I am thinking of using GITLabs don't know if that has a bearing on the solution.

DiggidyDale
  • 75
  • 1
  • 7
  • 1
    You can add dictionary file to `.gitignore` (this would give you the benefit of independently updating the dictionary file) and add `post-checkout` hook to copy that file from known location to somewhere inside the repo. However you'd have to redistribute the hook as it's not checked out by the clone process. – Eimantas Aug 26 '17 at 15:57
  • Thanks for that any idea where GITLab stores my files? – DiggidyDale Aug 26 '17 at 17:41

1 Answers1

0

The TL;DR summary is already in Eimantas' comment. This is a (long) expansion.

Getting what you want

The short answer is "no", but to properly understand why this is the case, and why there are ways around it and what they are and what they do, we need to look first at something unusual about Git's branches.

Specifically, this is that Git's branches don't mean anything. Well, that's a little too strong, but it's like talking about air, or flavor, or sound, or art, or any number of other words with too many meanings. A sealed room can be airless but not the same way the surface of the moon is airless; you can get the flavor of the argument in this sentence, but not the same way that wasabi tastes hot, which is not at all like a hot stove.

What is a branch?

The word branch, in Git, is ambiguous. See What exactly do we mean by "branch"? for several meanings. The one you probably intend in your question is the meaning implied by the more precise phrase branch name. But a branch name, like master or develop, in Git, is often used and/or meant as a proxy for a branch tip, which is in fact a commit. Not only is it a proxy for a commit, it's also, at any one particular moment in time, just a name for one specific commit.

There are several reasons we use a branch name like master instead of the "true name" hash ID of an actual commit, though. Consider the commit 4384e3cde2ce8ecd194202e171ae16333d241326. This is a real commit: it's the commit for Git version 2.14, whose annotated tag name, v2.14.0, is a human-readable name for Git object 2f13f6d0cd7509140f251ae271052341337084c8.

These big ugly 40-character seemingly-random strings of 40 digits and letters—hexadecimal digits, really, representing a 160-bit number—are one of the reasons we use names. Who is going to remember either of those big ugly hashes? But v2.14.0, that name actually means something to a human. A name like master or v2.14.0 fits in your head..

There's another reason we use names, though. The name master literally means 4384e3cde2ce8ecd194202e171ae16333d241326 (in the Git repository for Git anyway) at some point in time, but not at all points in time. Eventually Git v2.14.1 comes out, and this tag (6abeb172d024cf64814f81fde2c954f4870a57fc) names a different commit (4d7268b888d7bb6d675340ec676e4239739d0f6d) which for a while is also master. And then 2.14.1 has more work done and master changes again (it's currently 3dc57ebfbd1bf30b9a6987f551af9b74641382a9, which is not any specific release: it may become 2.14.2, or there may be more changes added before 2.14.2 comes out).

So a branch name, like master, is a name for one specific commit, but which commit it names, changes over time.

As noted in that other question, "branch" can also mean the line of development, which is a whole series of commits.

What is a commit?

It's worth a side trip into at least some of the details of a commit, and if you want to do that, read this answer I wrote earlier this morning. But the very short version is that a commit is a permanent, unchanging snapshot of some source tree. A commit, once made, can never be changed. This makes it very different from a branch name: branch names are always changing and evolving. They just remember one specific commit, but a different specific commit over time.

What this means for your case—having some file that's always the same in every branch—is that there is a very big Git problem: each branch (name) points to one specific commit, but if you want to change a file ... well, of course you can change files—a VCS that never let you change any file would be useless—but to store the file in Git, you must commit it. It becomes part of that (new) commit. And that new commit is different from the earlier commit, so now we change one branch name, and now all the other branch names name commits that have the old version of the file.

The solutions

There are only a few things you can do here:

  • Not store the file in Git (or this repository anyway) at all. Then it's not in any commit and therefore it's not tied to some specific commit.
  • Store the file in Git, in this repository, but not in a branch. (Well, wait! We'll come back to this one!)
  • Or, every time you update the file, update it in every branch: check out the branch, modify the file, make a new commit. When you are done, each branch name points to a new commit with the new, same-in-all-branches file.

That first method is the idea you were looking at (and rejecting) with submodules. Technically submodules would not help here anyway as submodules are very deliberately tied to one specific commit. A sub-repository would work, but has the same problem that led you to reject submodules: the file must live in a sub-directory.

The third method also works perfectly well, but has the drawback that it's ugly, clumsy, and error-prone (what happens if you miss one branch name in your "check out every branch, update, and commit" process?).

What we want is the middle method, or a hybrid of the middle and first methods. That's what Eimantas suggested in a comment. The mechanisms for this are a little bit tricky.

Work-trees, untracked files, and ignored files; and Git's index

Git, like many / most (all?) commit-oriented version control systems, lets you version-control some files and skip others. We tend to need this with any system that compiles source code to object or byte code, for instance. The technique for this is to keep the controlled code inside the repository—underneath the .git subdirectory, in a database that Git controls—and keep the working copy of the files in a work-tree.

The files in the work-tree have their ordinary form, so they are under the same ordinary controls as any files. You can read them, write them, or even fold, spindle, or mutilate them. They only get frozen-in-time when you put them into a commit; but you can avoid putting them into commits in the first place, to avoid freezing a copy.

A file that you never want to commit, you simply never put into the "set of files to commit". This set-of-files, in Git, is maintained in Git's index. The index is kind of mysterious, because there's no easy way to view it, but a good short description of the index is this: The index contains all the files that will be in the next commit you make.

When you copy a file into the index, it stays there. When you make a new commit, Git takes whatever is in the index and freezes it into a commit. The index continues to contain those files, ready for the next commit. This goes on as you make more and more commits; and it's why you have to git add the same files over and over again: each git add means "copy from work-tree to index".

(When you run git checkout to check out some particular branch-and-its-tip-commit, Git replaces the index with one that matches that other commit, so that you're ready to work on that branch. In the process, it adjusts the work-tree too. This isn't quite correct, but is a good enough mental model to start with.)

If you never add a file to the index, so that it's not in the index, but you do have that file in your work-tree, Git calls this an untracked file. Since it's not in the index, it's not in the commits either (commits being frozen copies of previous index files). To keep Git from complaining about it, or automatically adding it with an en-masse "add all files" command, you can list an untracked file in .gitignore.

A .gitignore file contains a list of file names, or name glob patterns like *.o or *.pyc. The name .gitignore itself is a bit of a misnomer, though. It doesn't literally mean ignore these files. It means: if this file is untracked in the work-tree, don't complain about it, and don't automatically add it. It should be called .git-shut-up-about-these-files-and-do-not-automatically-add-them. But that's a little unweildy, hence .gitignore. The important thing to remember here is that .gitignore has no effect on tracked files. It's only for untracked files.

Tracked files are files that are in the index right now, and will thus be in the next commit. Untracked files are files that are not in the index right now, and hence won't be in the next commit. That's it—that's one of the simplest parts of Git, really! "Tracked" means "in the index", and for a file to be ignored, it first has to be untracked. If you've managed to make it tracked—to put it into the index—you have to remove it from the index (and then commit, so that there's a place to go back to later, that doesn't put it back in the index when you git checkout later).

Let's get concrete: an example

Let's make a new, tiny repository and put one commit in it to start:

$ mkdir tt && cd tt && git init
Initialized empty Git repository in ...
$ echo your basic readme > README
$ git add README
$ git commit -m 'initial commit'
[master (root-commit) 08fc157] initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README

Now let's make an entirely unrelated branch which will hold our "latest dictionary" file. Note: we don't have to use an related ("orphan") branch, I just like to do it this way.

$ git checkout --orphan dictionary
Switched to a new branch 'dictionary'
$ echo this branch is just for the dictionary - edit it here > README
$ cat << END > dictionary-under-odd-name
> This is our dictionary file.
> I have no idea what goes into it,
> so I am just putting some text here.
> END
$ git add README dictionary-under-odd-name 
$ git commit -m 'initial dictionary'
[dictionary (root-commit) 6a17792] initial dictionary
 2 files changed, 4 insertions(+)
 create mode 100644 README
 create mode 100644 dictionary-under-odd-name

Now let's go back to our regular master branch, and see what we have:

$ git checkout master
Switched to branch 'master'
$ ls
README
$ cat README 
your basic readme

Good: there's no sign of the dictionary. There is no file named dictionary in our index. That's what we want. Now let's make sure there won't be in the future, either, by making a .gitignore that lists the file named dictionary:

$ echo dictionary > .gitignore
$ git add .gitignore
$ git commit -m 'make sure never to commit dictionary on normal branches'
[master db3d9d0] make sure never to commit dictionary on normal branches
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore

Now we want some way to have any updated dictionary file show up in all our branches.

Well, there are a lot of options here. The simplest is not to bother with a hook at all. Since a file named dictionary is not now in the index, and won't be, all we have to do is grab a copy out of the special branch, where it has a special name:

$ git show dictionary:dictionary-under-odd-name > dictionary

Let's see what we have now:

$ ls
README  dictionary
$ git status
On branch master
nothing to commit, working tree clean

That looks pretty good! If we create other branches, they also won't have a tracked file named dictionary, so Git won't disturb this file. If we specifically check out the dictionary branch, Git still won't disturb the file, because it has a different name there:

$ git checkout dictionary
Switched to branch 'dictionary'
$ ls
README         dictionary-under-odd-name
dictionary

We can now edit the frozen-in-the-branch version, or copy it from the untracked version or copy to the untracked version.

Let's look at something else:

$ git status
On branch dictionary
Untracked files:
  (use "git add <file>..." to include in what will be committed)

    dictionary

nothing added to commit but untracked files present (use "git add" to track)

This is because our dictionary branch doesn't list dictionary as ignored. In fact, we don't even have a .gitignore file in this branch. That's OK, we just have to be sure not to git add dictionary. (Or we can create a .gitignore here too, and commit that. As long as .gitignore is itself tracked, a copy gets frozen into each commit, and switching from commit to commit will change the .gitignore file. This is actually really similar to when there's a .gitignore file in one commit, and not in another: it's just that now, when we switch, Git has to add or remove the entire file. That was true here, for instance: we switched from the tip commit of master to the tip of dictionary and Git removed .gitignore. If we have a different .gitignore in our two commits, Git will switch the copy of the file, in both index and work-tree.)

In any case, things are good enough, but we can ignore dictionary here, or not. Use whichever you like—just be sure not to commit dictionary here, because if you do, Git will know to remove it when going from this branch's tip commit (which now has a file named dictionary) to some other branch's tip commit (which doesn't). This is fixable: just remove it again and commit. Now dictionary won't be in the index, and the file will be untracked once you create dictionary in the work-tree, and you're back to the previous state.

If you do want the file dictionary to get updated on every git checkout, using the version stored in the branch named dictionary, then you need a post-checkout hook.

Post-checkout hook

The post-checkout hook is described thus:

This hook is invoked when a git checkout is run after having updated the worktree. The hook is given three parameters: the ref of the previous HEAD, the ref of the new HEAD (which may or may not have changed), and a flag indicating whether the checkout was a branch checkout (changing branches, flag=1) or a file checkout (retrieving a file from the index, flag=0). This hook cannot affect the outcome of git checkout.

(That last sentence is not quite right: if the hook exits nonzero—indicating failure—Git considers the git checkout to have failed. This is probably a bug. Moreover, the hook can do whatever it wants to the work-tree, and even to the index. What the manual page means here is that git checkout is finished, even if the hook says "fail". Git won't "undo" the checkout process.)

Because you have these arguments, you can make this as fancy as you like, but the simplest is just to write whatever's in dictionary:dictionary-under-odd-name into the file named dictionary:

#! /bin/sh
# post-checkout hook: update file named "dictionary"
topdir=$(git rev-parse --show-toplevel) # paranoia
git show dictionary:dictionary-under-odd-name > $topdir/dictionary

If you put this into a file named .git/hooks/post-checkout and make that file executable, every git checkout will run the commands in it. These will do what we did manually in our one-time case above.

Note that this will overwrite dictionary with the frozen branch-tip odd-name version every time, even if you have put stuff in dictionary that you want to keep. So this is a bit dangerous, if you code it this way.

Since Git is Git, you have many other options. One would be to name the file dictionary in the branch named dictionary, and have it as a regular tracked file. Git will refuse to overwrite the untracked file that you have in your work-tree, so if you ever need to add a new frozen snapshot version, you would use this sequence of commands:

$ mv dictionary out-of-the-way
$ git checkout dictionary
Switched to branch 'dictionary'
$ mv out-of-the-way dictionary
$ git add dictionary
$ git commit

Then, after switching back from dictionary to master or develop or some other branch where the file isn't in the index or the work-tree, Git will remove the one you just committed, and you have to retrieve the file again:

$ git checkout master
Switched to branch 'master'

The file is gone now, so:

$ git show dictionary:dictionary > dictionary

and now it is back and untracked, because it's still not in the index. If it's listed in .gitignore it's still untracked-and-ignored, too.

You can fancy up the post-checkout hook to do this sort of thing automatically, using the three parameters mentioned in the manual page. This is quite untested but probably works:

#! /bin/sh

# if we did not change branches, do nothing
if [ "$3" != 1 ]; then exit 0; fi

# what branch did we switch to? use the name HEAD for detached HEAD
curbranch=$(git rev-parse --abbrev-ref HEAD)

# if it's "dictionary", do nothing
if [ "$curbranch" = dictionary ]; then exit 0; fi

# otherwise, grab file "dictionary" from branch "dictionary"
topdir=$(git rev-parse --show-toplevel)
git show dictionary:dictionary > $topdir/dictionary

If you choose to use a script, it is probably a good idea to commit this script to a file, so that you can version-control it. One place to do that would be on the branch named dictionary! Then you could do:

$ git show dictionary:post-checkout > .git/hooks/post-checkout
$ chmod +x .git/hooks/post-checkout

after a fresh git clone, to establish the most recent post-checkout hook version as the post-checkout hook for this new clone of the repository.

torek
  • 448,244
  • 59
  • 642
  • 775
  • 2
    Your explainer answer uses difficult words and IMHO, quite a tad too long for an answer to a question such as this. – rubenvb Aug 26 '17 at 22:04
  • @rubenvb: maybe I should not have linked to that. I didn't mean "explainer" as in *simple*... – torek Aug 26 '17 at 22:13