0

I'm still learning Git, and I understand that a branch is a copy of the master repository that allows you to make changes without harming things on the master repo while also allowing multiple people to make changes on their own branch.

However, let's say you have a server that hosts a web application and a copy of the same code in another directory on the server for testing. Would it be possible to connect the master repository to the live application, and connect the test code to a specific branch that can be merged into master with a pull request?

On the incoming conversation of feasibility - in order to deploy the new changes when they are made from the development branch and merged, we would use a Jenkins job to do the deployment.

2 Answers2

4

I apologize for the length of this, but you're starting out with some misconceptions. The TL;DR is that you can indeed get what you want, but you don't quite know what you want, yet.

I'm still learning Git, and I understand that a branch is a copy of the master repository that allows you to make changes without harming things on the master repo while also allowing multiple people to make changes on their own branch.

Unfortunately, this understanding is not really correct. Understanding Git, or distributed version control in general, is not easy—but the first thing you have to discard is the notion that there is a master repository! (We often re-introduce it later, but it helps to start by throwing it out. :-) )

With centralized version control systems (CVCSes, e.g., Subversion or ClearCase), there really is a "true master". There may be extra copies of the "true master" for whatever reason (usually speed), but it's pretty easy to tell whether you're dealing with the real source of truth—the repository wearing the sherriff's star, or the Presidential hat, or whatever it is that distinguishes the source-of-truth repository—from every other one. The CVCS has a central server; you put the repository there; and you're done (well, except for backup and disaster recovery issues).

With true distributed (the "D" in DCVS) version control, all repositories are equal. Well, they're equal unless you, deliberately or accidentally, treat one as the one wearing the sherriff's star. That is, you can use a DVCS like a CVCS, by designating one of the servers as "the" server, but that's your doing, not a property of the system. This changes a whole lot about the internals of the system.

(I note that you're also asking about GitHub. In general, when people use GitHub, they tend to treat the copy that's on GitHub as "the" server—the source of truth. But the GitHub repository has its branches, which are not the same as your branches. Moreover, there's no need to claim that they are the source of truth: you can treat them as just another copy. Again, the "master-ness" of any repository is something you decide. All Git repositories are equally masterful except in terms of how you choose to use them!)

In Git, you normally clone a repository such that you have a complete copy of everything. Your repository is then just that—yours—so that you can do anything you want to it. Whatever you do won't affect the any other clone at all. You automatically have your own branches at this point. In Git, you don't have the original repository's branches as branches, but you do have all of them in some other form. You start out, in fact, with no branches of your own, but usually immediately make a first one: the normal last step of git clone includes creating one (1) branch name. This is now your (single) branch. Usually, it's master (though you can control this). Usually, you'll go on to create a second branch name, which is the one you will do your work on. (Once you do that, you can freely delete your master branch, so that you need not drag it around and update it. This is up to you though.)

This leads into the whole question of terminology. Git's is maddenly confusing. See What exactly do we mean by "branch"? for more about this issue. I have taken to using the phrase branch name to refer to names like master and testing, and remote-tracking name to refer to the names like origin/master. When you git clone a repository, you turn all of their branch names—"they" being the other Git repository that you're cloning—into remote-tracking names: their master is now your origin/master; their develop is now your origin/develop; and so on.

When dealing with Git, you must also keep, in your head, a bunch of seemingly-conflicting ideas:

  • Commits are snapshots: permanent (mostly), incorruptible (entirely), unchanging. They are uniquely identified by a hash ID. These hash IDs are big and ugly and incomprehensible to humans, but they are critical inside Git.
  • Commits are the history. Commits link to each other to form a (directed, acyclic) graph. Making a new commit is a process of making a snapshot, adding metadata like who made the commit and why, and linking that new commit back to its parent commit. As the final step of making a new commit ... well, let's wait on this for another bullet-point.
  • While commits contain files, commits aren't files. There is no file history, there is only commit history. But Git can show you file history. It does this by making it up on the fly, using the commit history. To see what changed in a commit, Git will simply compare the parent's snapshot to the child's snapshot. Whatever is the same, is not changed. Whatever is different, that's what changed.
  • There is always a current commit (known as HEAD).
  • But HEAD is normally "attached" to a branch name. As a result, HEAD only identifies a name, not a commit. (There is a "detached HEAD" mode where HEAD identifies the commit itself directly, but it's mostly used for browsing through history and for special case work—normally you want to be "on a branch", which you do by attaching your HEAD.)
  • Each branch name (and other names like tags) identifies one, and only one, commit. Git calls this the tip commit of that branch.
  • But commits are on—or maybe better described as "contained within"—branches.
  • In Git, some commits are on many branches simultaneously. The set of branches that contains a commit is determined by starting with the branch name and working backwards through history. A merge commit, which has two or more parents, tends to result in commits being on multiple branches. A root commit (usually there's just one) is usually on all branches.
  • Making a new commit creates the new commit with the current commit as its parent. This also creates the new commit's real name: its hash ID, which is a crypographic checksum of the commit's contents. This new ID is by definition different from every existing ID, and almost impossible to predict, but also completely predictable, because it's just a hash of the commit's contents. (The contents include a time stamp, so since time is always increasing and we don't know precisely when the commit will be made until it's made, that's part of what makes it hard to predict.) Then, so that the new commit is on the current branch, what Git does is to write the new hash ID into the current branch name. The new commit is now the tip commit, and this is how branches grow.

This is all about the commits. What's inside the commits—the stored data in the form of snapshotted files (which are stored indirectly and compressed)—is internal to the repository. We also need to have an area where some files are available in their normal form. In pretty much all version control systems (centralized or distributed), that's called the work-tree or working tree or some variation on this theme. With a centralized VCS, the work-tree may be all that you have. In a DVCS like Git, though, you generally do have the entire VCS database locally—plus the work-tree; and in Git, there's another key data structure, that Git forces you to know about, called the index. (It actually has three names, to reflect its three different aspects or roles. The more user-friendly name is the staging area, because you use it to stage files' contents before making snapshots. The third name is the cache.)

But that's all background—relevant, important, but background—to your real question:

However, let's say you have a server that hosts a web application and a copy of the same code in another directory on the server for testing. Would it be possible to connect the master repository to the live application, and connect the test code to a specific branch that can be merged into master with a pull request?

Here, because Git is distributed, you can choose many different ways to do this.

To act as a web application or web server, you will need your files in their normal every-day form, not their highly-compressed all-history-forever Git-ized form. So you need a work-tree—or at least something that looks exactly like one—instead of or in addition to a repository. You may be able to get away without having a repository at all here, provided there's a repository somewhere on this machine.

You also want a second work-tree, for this second test variant. This second work-tree can be associated with a completely separate second repository—since Git is distributed, this second repository can exist independent of the first one (if there is a first one at all).

If you have two separate repositories, they have their own separate branch names. There's no requirement that these coordinate in any way. For sanity, though, you might want to have the first repository-and-work-tree pair operate off its own master. Each repository can and usually will have its own master, but to the extent that you keep them synchronized, their masters will match up, which will be easy to remember. (Hash IDs are impossible to remember!)

If you have only one repository, things are a bit different. A Git repository normally comes with one (count it, 1) work-tree and associated index. You need two, or at least, two things that look like work-trees. So two repositories is an easy way to do this.

Since Git version 2.5, though, Git has built in to it the idea of extra, added work-trees. You obtain these through the git worktree command. Each repository starts out with the one basic work-tree (and index). Using git worktree add <arguments> will direct your Git to create a new, additional work-tree (which comes with its own index). Each added work-tree comes with a constraint, though: it must be on a different branch from every other added work-tree and from the main original work-tree. The branches are all a property of the repository proper, while the work-trees are independent—and by forcing them to be on different branches, Git makes sure they stay independent.

But that's exactly what you want. So you can use this method, provided your Git is at least version 2.5. If not, you can go with two separate repositories, which will have two separate work-trees, and of course separate branches—the master or test or staging or whatever in one repository need not be related at all to the master or test or staging or whatever in the other repository, because the branch names belong to the repository. If you have two repositories, you have two independent sets of branch names. If you have just one repository, you have only one set of branch names.

(Even if you have an older version of Git, you can still use this "one repository, two checkouts" approach, provided you have no intent of actually working within at least one of the branches. This method is more complex, and I won't go into it here as this answer is already too long.)

Note that no matter which way you go, the repository that you keep on GitHub—assuming you choose to keep one there—is still another separate Git repository. It has its own branches, independent of any other Git repository. This is where things become especially tricky.

At some point, once you make new commits, you will want to send those new commits to some other repository. Note that when doing this, there is a sending repository, and a receiving repository. They are each independent, except that they will share commits (by those hash IDs) with each other, and ... well, hold that thought for a moment.

To send commits, you will use git push: your Git calls up the other Git and sends commits. To receive commits, you will use git fetch: your Git calls up the other Git and receives commits. Both commands connect your Git—the repository you are in when you run them—to another Git repository, and transfer commits between the two repositories. This part is pretty straightforward, except for the fact that the hash IDs are incomprehensible to humans.

Having sent or received the commits, though, your Git and the other Git now need to do one last trick:

  • If you're receiving commits (git fetch), you got their commits according to their branch tip commit IDs. Your Git also knows what branch name their Git used to save that ID—to hold the incomprehensible thing. Your Git will save that ID by renaming their branch, such as master, to your remote-tracking name, origin/master. This is the same thing that happened when you made the initial clone.

    Note that nothing happens to their repository here. You just add new commits to your repository, and you do this under your own control, and call their branches by your remote-tracking names. So this has no effect at all on your own branches.

  • If you're sending commits (git push), you must ask the other Git to set one of their branch names, so as to make a commit you have sent (or that you both already have) into one of their branch tip commits. In contrast with how fetch doesn't touch your branches, this does affect their branches. You're literally changing their branches—asking them to change which commit is the tip of their branch name—when you git push.

    For that reason, they have the option of accepting this request, or rejecting it. In general, they will accept this request if you are only adding new commits to their branch, and not if your request would remove commits from their branch. If they don't accept the polite request, you can send a more forceful command, but they don't have to obey that one either.

    What requests or commands they accept, and what they reject, are based on controls you can set up. Web services like GitHub have fancy web interfaces for this. If you are not using a fancy service, you have to write your own access controls.

After git fetch, which only adds their commits and updates your remote-tracking names, you will generally need to do something—often git merge or git rebase—to incorporate their commits into your branches. Git comes with a convenience command, git pull, that combines git fetch and this second step. Avoid git pull when you are starting out! Until you understand git fetch, the second command—git merge or git rebase—will seem to be doing impossible things. Meanwhile the git pull convenience command uses syntax that does not mesh with the rest of Git. If you break the pull into its two separate commands, everything fits a lot better. It's still going to be confusing at first, but at least it's (mostly) consistent.

If you plan to use GitHub's "pull request" mechanism, you may want to set up even more copies of the repository, using what GitHub calls forks. Forks are essentially clones, but made directly on the GitHub server, with hidden linkages so that GitHub can remember which Git repositories are forks of which other Git repositories (and, I would guess, secretly share disk space and internal Git objects behind the scenes). These are yet more repositories, all of which can get out of sync with each other, and all of which can be re-synchronized through git fetch and git push operations.

torek
  • 448,244
  • 59
  • 642
  • 775
  • I can't believe the amount of energy you put into this answer. Thank you so much for this explanation! I'm still working through the details here, but if I have any questions, I'll comment them here :) – Some Dude From the Internet Mar 01 '18 at 15:55
0

Yes. You would just have two local repositories on the server, (and hopefully others elsewhere in case your server explodes). Let's say I have the directory structure

           /----------live-app
          /
--home---/------------testing

Both 'live-app' and 'testing' would be separate git directories. They would each have their own .git files. You would just do checkout master when inside 'live-app', leave it there, and do regular pulls. And you would work on whatever other branches in testing. When you were ready for a testing branch to be merged into master, you would just take care of all of that outside of 'live-app'.

Just have 'live-app' stay on master. The only thing Jenkins should do is a git pull.

JCollier
  • 1,102
  • 2
  • 19
  • 31