2

I am using git for committing my project changes.

For this purpose :-

  1. I created a new empty repository in github
  2. I created a secondary branch named second apart from master branch
  3. I committed and pushed my changes onto that second branch
  4. Now all my projects and its contents are on second branch and my master branch is empty

Now, I have two doubts:-

a) can I push any code to empty master branch directly without getting into merging process of two branches?

b) how to merge second branch with empty master branch

keshav dwivedi
  • 117
  • 4
  • 14

2 Answers2

7

We have to start with this weird idea: There is no such thing as an empty branch in Git.

Empty branches do not exist in Git

In Git, a branch name like master identifies a commit—exactly one commit, always. This single commit is the tip commit of the branch. If that commit has one or more parent commits, and almost every commit does have at least one parent, the parents are also contained within the branch, as is/are the parent(s) of the parent(s), and so on.

Note that it is the commits themselves that form this backwards-looking structure, starting at the end and working backwards:

A  <-B  <-C

Here, commit A is the very first commit ever made. It has no parent: it is an exception to the general rule that commits have parents. A has no parent because it cannot have one; it was the first commit ever. But commit B contains the hash ID of commit A, as B's parent. We say that B points to A. Likewise, commit C contains B's hash ID, so that C points back to B.

The branch name master would typically then point to commit C as its single identified commit:

A--B--C   <--master

Git uses the name master to find C, uses C to find B, and uses B to find A.

If we create a new branch now, e.g., using git branch develop or git checkout -b develop, this makes a new name that points to the same commit C:

A--B--C   <-- master, develop

We tell Git which branch we want to be "on" using git checkout. This attaches the name HEAD to one of the two branch names. For instance, if we git checkout develop now, HEAD is attached to develop:

A--B--C   <-- master, develop (HEAD)

If we now make a new commit now—let's call it D, though Git will invent some big ugly hash ID for it—the way Git achieves this is to create new commit D with commit C as D's parent, so that D points back to C:

A--B--C
       \
        D

and as the final step of creating this commit, Git updates whichever branch name HEAD is attached to, so that this name now points to new commit D:

A--B--C   <-- master
       \
        D   <-- develop

Note that at no time was develop an empty branch. Initially, develop and master both pointed to commit C; both contained all three commits. Now develop contains four commits, and master still contains the same three commits as before. Commits A-B-C are on both branches at the same time.

So, what did you do?

  1. I created a new empty repository in github

Note that GitHub has two ways to create a repository. One way creates an initial commit for you, and makes master point to that initial commit. The initial commit contains a README file and, optionally, a .gitignore and/or a LICENSE file. GitHub will create this initial commit if you check the box:

Initialize this repository with a README

If you don't check the box, GitHub really does create an empty repository. There are no commits, so there are no branches in this empty repository!

  1. I created a secondary branch named second apart from master branch

Because a branch can only be created if there is a commit for it to point-to, you literally can't do this. The error message here is a little weird, but it shows that this can't be done:

$ mkdir er; cd er
$ git init
Initialized empty Git repository in .../er/.git/
$ git branch second
fatal: Not a valid object name: 'master'.

Why did Git complain about the name master? The answer is a little surprising: the branch master does not exist, but our HEAD is attached to it anyway. That is, HEAD is attached to a branch that doesn't exist.

This is not a very sensible state of affairs, but Git needs it to get started. The first commit is weird: it won't have a parent. Git needs to save the name of the branch to create somewhere, so it saves it by writing it into HEAD. Once we actually make a new commit, that will create the branch name.

Git reports this state in several different ways, calling this either an unborn branch or an orphan branch, depending on which bit of Git is doing the calling. Here's what git status says, in this particular version of Git:

$ git status
On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Similarly, git log is now a little smarter than it was in the bad old days:

$ git log
fatal: your current branch 'master' does not have any commits yet

(it used to give a more mysterious error, the way git branch still does).

We can, however, ask git checkout to create a new branch name, and if we do so, our state changes:

$ git checkout -b second
Switched to a new branch 'second'
$ git branch
$ git status
On branch second
...
$ git log
fatal: your current branch 'second' does not have any commits yet

The git branch output is empty, correctly showing that we still have no branches at all. The git status output shows us that we're on this unborn branch, and git log tells us that there are no commits here.

If we make a commit now, the branch name second will spring into being, pointing to the new commit, which will be the only commit:

$ echo example > README
$ git add README
$ git commit -m 'initial commit'
[second (root-commit) d4d9655] initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 README
$ git log --all --decorate --oneline --graph
* d4d9655 (HEAD -> second) initial commit
$ git branch
* second

There we have it: the one branch now exists; the name second identifies the very first commit, which in this particular case has d4d9655 as its (abbreviated) hash ID.

  1. I committed and pushed my changes onto that second branch

If you were able to create the branch in step 2 using git branch, that means you already have a master and it's not empty, and the commit you made in step 3, on branch second, made the name second point to the second commit in the repository, whose parent is the first commit in the repository, to which the name master points. If that's the case, running:

git log --all --decorate --oneline --graph

will show both commits, plus the two branch names as decorations.

If not—if you really had no master branch at all—then you still have no master branch; you have just one commit, on one branch named second.

  1. Now all my projects and its contents are on second branch and my master branch is empty

Again, this is literally impossible. Use git log --all --decorate --oneline --graph to see all of your commits and where the branch names go. The --graph option does not do anything yet, but once you have merge commits, it's quite useful.

Clearing up your questions

Now, I have two doubts:-

Aside: you mean two questions. The word doubt implies that you already have answers, you just believe that there is a good chance that these answers are wrong.

a) can I push any code to empty master branch directly without getting into merging process of two branches?

There are several important distinctions to make here as well. First, when you use git push, what you push are commits. Git is all about commits. Each commit has, as we already saw above, some parent commit(s), except for the initial ("root") commit. Commits also save data about the commit: your name and email address and a time-stamp, and a log message, for instance. And, along with this commit metadata—parent(s), author/committer, and log message—each commit saves a snapshot of a set of files.

Hence, in one sense a Git repository does contain files, but at a higher level, it doesn't really involve files that much: a repository is a collection of commits. The commits drag files along with them as a side effect. That side effect is of course what makes Git useful, but it's important to keep in mind that Git itself is not so much concerned about the files as it is about the commits.

You can create a new commit any time you like. When you do, Git will:

  1. collect a log message from you;
  2. package up (freeze forever) the contents of the index (which you update with git add);
  3. save your name and email address and the current time as the author and committer;
  4. write out a new commit with the current commit as the new commit's parent, so that the new commit points back to what was the current commit at the time you ran git commit, and the snapshot made in step 2;
  5. acquire a hash ID for the new commit by finishing step 4;
  6. write that hash ID into the current branch name, recorded by HEAD.

We have not yet talked about the index I refer to in step 2 here; we'll get to that in a little while. That last step, though, is what makes the branch name change. The name changes by advancing to point to the new commit, which is now the last commit on the branch. All the earlier commits are also on the branch, reached by working backwards from the new tip commit.

b) how to merge second branch with empty master branch

Again, branches are never empty.

To merge one branch with another, we use git merge, but before we do, we generally need to run git checkout first, to select a branch. It's important to realize precisely what git checkout does. This gets into the difference between commits, the index, and the work-tree.

Commits, the index, and the work-tree

Commits, in Git, are permanent—well, mostly; you can discard commits entirely in some situations, which is actually really useful—and entirely read-only. No commit, once made, can ever be changed, even one single bit. Commits have multiple purposes, but the main one is to let you get back every file as it was at the time you made the commit.

Each commit has a single "true name" that never changes, which is its hash ID. There are many other names you can use at various times, such as branch names, to identify commits, but these all resolve to a hash ID in the end. You can do this resolving yourself at any time:

$ git rev-parse second
d4d9655d070430e91022c1ad843267f9d05f60d1

This shows that the commit I just made earlier has, as its full hash ID, d4d9655d070430e91022c1ad843267f9d05f60d1. These hash IDs appear random, but are actually cryptographic checksums of the full contents of the commit. This is why neither you nor Git can ever change anything in a commit: if you try, you just get a new, different commit with a different checksum; the original commit remains undisturbed.

The files stored with a commit are kept in a special, frozen (read-only), Git-only, compressed format. They, too, can never be changed (and in fact, Git stores them under hash IDs, just like commits). This property, of never changing, means that Git can re-use the files in new commits if the new commit has the same file contents as a previous commit. That's one reason that, even though every commit stores a complete copy of every file, Git repositories tend not to take a lot of disk space: there's a lot of re-using of the old files.

Of course, a system that never lets you change any files is not very useful: it's fine for archive retrieval, but no good for development. So Git extracts frozen, Git-only compressed committed files into regular, read/write files in their ordinary computer format. These files are in your work-tree. But Git does not use these files itself: the work-tree copies are there for you to use, not for Git.

What Git does—this is unusual; most version control systems do not do this—is to interpose something between the frozen Git-only committed files and the easy-to-use work-tree files. This "something" is Git's index. Extracting a commit, using git checkout, fills in this index by un-freezing the frozen file into the index. The copy in the index is still in the special, compressed, Git-only format (with hash ID), but now it can be overwritten.

Hence, what git checkout does is:

  • populate the index from the commit: un-freeze, but don't yet decompress, the files; then (at the same time)
  • populate the work-tree from the index: decompress the files; and
  • finally, attach the name HEAD to the branch name you chose to check out.

The result is that after a successful git checkout somebranch, three things typically hold:1

  1. The index contains all the files from the tip commit of somebranch.
  2. The work-tree contains all the files from the tip commit of somebranch.
  3. HEAD is attached to the name somebranch.

This means you are now ready to modify work-tree files and—this is the tricky part—copy them back into the index. The git add command takes a work-tree file and copies it into the index, compressing (but not yet freezing) the file into the special Git-only form.

If you create an all-new file in the work-tree and use git add, that copies the file into the index. The file is all-new in the index at this point. If you modify some existing file in the work-tree and use git add, that copies the file into the index, overwriting the one that was in the index before. You can also use git rm to remove files from the index, but note that this will also remove the work-tree copy.

When you run git commit, Git simply freezes whatever is in the index at that time. This is why you must keep git add-ing to the index. The index can therefore be summarized as what will go into the next commit, if and when you make another commit.


1There are a bunch of exceptions to this rule: these three things only hold for sure when you git checkout from a clean state. For (much) more, see Checkout another branch when there are uncommitted changes on the current branch. However, if the checkout succeeds, HEAD will be attached to the target branch.


A brief description of merge

Let's look now at what happens if you run:

git checkout somebranch
git merge otherbranch

The first step, as we saw, attaches our HEAD to the name somebranch, getting the tip commit of somebranch into our index and work-tree. Assuming all has gone well, our index and work-tree both exactly match this tip commit at this point.

The second step uses the commit graph—that same graph we were drawing above. It's now time to draw the graph. There are, unfortunately, many possible drawings, but let's start with this simple one: the graph might look like this, for instance:

          H--I   <-- somebranch (HEAD)
         /
...--F--G
         \
          J--K--L   <-- otherbranch

What Git does for this case is to locate the best common commit—the commit that's on both branches that's "near the tips". In this case, if we start at the tip of somebranch, i.e., commit I, and work backwards, we will reach commit G, and if we start at L and work backwards, we will also reach commit G. That's a better common commit than commit F, so commit G is the merge base of this pair of commits.

Or, the graph might look like this:

...--F--G--H--I   <-- somebranch (HEAD)
               \
                J--K--L   <-- otherbranch

Git will still find the best shared common commit, by starting at I and working backwards while also starting at L and working backwards. That best commit is commit I itself. This enables what Git calls a fast-forward, which is not really a merge at all.

The graph might look instead like this:

...--F--G--H--I   <-- otherbranch
               \
                J--K--L   <-- somebranch (HEAD)

In this case, the common commit is behind the tip of somebranch. There is nothing to merge, and git merge will say so and do nothing.

There are more possibilities, including very tangled graphs, but let's stop here. (For computing merge bases in complex topologies, see other answers, or read up a bit of proper graph theory including this paper by Bender et al.)

To do a fast-forward, Git essentially just moves the branch name from its current position to the new tip, and runs git checkout on the new tip commit, giving:

...--F--G--H--I   [old "somebranch"]
               \
                J--K--L   <-- somebranch (HEAD), otherbranch

To do a real merge, Git uses the merge base.

Real merges

The goal of a real merge is to combine changes. Remember that each commit is a snapshot of files, though, not a set of changes. What Git must do, then, is compare the base snapshot to each of the two branch tips. That is, given:

          H--I   <-- somebranch (HEAD)
         /
...--F--G
         \
          J--K--L   <-- otherbranch

Git starts by enumerating all the files from commits G, I, and L. Git compares the set of files in G to the set of files in I to find out what we changed:

git diff --find-renames <hash-of-G> <hash-of-I>   # what we changed

and then compares the set of files in G to the set of files in L to find out what they changed:

git diff --find-renames <hash-of-G> <hash-of-L>   # what they changed

Git now has two change-sets, which it can try to combine. For every file that neither of us changed, the combining is easy: use the file from G, or from I, or from L (it does not matter which: all three copies are the same). For every file that only one of us changed, use that copy from that commit, I or L. For files that both of us changed, take the merge base copy from G and add both changes.

If the two changes to that file collide, Git will normally declare a merge conflict, leave all three copies of the file in the index, write to the work-tree its best effort at combining the changes and adding conflict markers to the parts that conflict. If all the changes mesh properly, Git will go ahead and write just the final combined file to the index (and to the work-tree).

After combining all the changes to all the files, if there were no conflicts, Git will make a new commit from the index as usual. This new commit will, however, have two parents instead of the usual one parent. Then Git will update the current branch name as usual, so we get:

          H--I------M   <-- somebranch (HEAD)
         /         /
...--F--G         /
         \       /
          J--K--L   <-- otherbranch

where M is a merge commit with two parents.

If there were some conflicts Git could not resolve on its own, Git stops with a merge conflict message. It is then your job to write the correct merge result into the work-tree and use git add to copy it into the index, erasing the three copies Git left behind in the index. (Until then, you can extract any or all three of the three inputs—the merge base version, the left-side or local or --ours version, and the right-side or remote or other or --theirs version—from the index, too.) Once you have resolved all the merge conflicts and git added all the resulting files, you run git commit (or git merge --continue which just runs git commit for you) to commit the merge result, producing the same graph as if Git had done the commit automatically.

torek
  • 448,244
  • 59
  • 642
  • 775
1

Assuming that you created your second branch locally in your computer and committed new changes there, you can merge them locally into master with

git checkout master
git merge second

Now your local branch master contains the commits from second. You can push this to the remote repository (github) assuming you cloned your local repo with git clone with

git push

If this does not work is because you need to set your remote repository with

git remote set-url origin https://github.com/USERNAME/REPOSITORY.git

To put the commits you made in second into master you definitely need to merge.

b-fg
  • 3,959
  • 2
  • 28
  • 44