TL;DR
You probably want a script that does something along these lines:
#! /bin/sh
# ... check actual invocation and set some variables:
branch=<something>
git fetch origin || exit
# check actual invocation and/or fetch results and set one
# more variable:
startpoint=<something> # or origin/something
# you may even want to use `git rev-parse` here to convert
# a name to a raw hash ID:
# startpoint=$(git rev-parse $startpoint) || exit
git checkout -b $branch $startpoint || exit
git push -u origin $branch
where startpoint
will depend on where—as in, based on which commit—you want to start working. You probably don't want any git pull
steps in here.
Long
If you are thinking too much in terms of branches, that would explain this:
The problem is that I am having hard time understanding the pattern when it fails.
The problem here is that branch doesn't really mean any one thing in Git. Sometimes, when people say a branch, they mean a branch name. Other times, they mean a series of commits, i.e., some part of the commit DAG. (See also What exactly do we mean by "branch"?) The thing is that branch names move over time, which makes relying on them to describe series-of-commits both very useful, and very tricky.
As Hugo Valenza M suggests, I also generally recommend not using git pull
. What git pull
does is to run two Git commands:
First, it runs git fetch
, passing along most of the other arguments you gave to git pull
(if any). This step has your Git call up some other Git and obtain new commits—commits the other Git has, that you don't yet, that they offer and your Git decides it should take.
Your Git generally remembers their Git's branch names, in the process, using what Git calls a remote. A remote is mostly just a short name to remember a URL, but having this enables a number of other features. It's very common to have exactly one remote in your Git repository, named origin
.
Typically, after your Git calls up the Git at origin
, your Git now has its own memory of their branch names: their master
is now your origin/master
, their develop
is now your origin/develop
, and so on. These names—the origin/
-prefixed ones—are your Git's remote-tracking names. They're subtly, or sometimes blatantly, different from regular (local) branch names.
Then, after the git fetch
succeeds, git pull
runs a second Git command. The usual default command is git merge
and git merge
is where you can get merge conflicts. You can direct Git to use git rebase
instead; here, you can get merge conflicts. You won't always get merge conflicts! It depends on your commit graph, and the contents of the various commits.
If you run the two commands yourself, you gain much more direct control. Most importantly, you can inspect the result of the git fetch
command before you proceed. You will also immediately see that the git merge
or git rebase
step operates using your current branch.1 The git fetch
step doesn't,2 which means you can run git fetch
any time, regardless of what prior git checkout
you may have used, but the merge or rebase step does, so it matters.
In your particular case, your desired second command is probably neither git merge
nor git rebase
. So here, you probably want git fetch
followed by ... well, we'll get to that in a bit, but you've seen above that it's probably git checkout -b
.
1Well, you will when you fully understand merge and rebase. Since git pull
uses them, you must understand them; there is no going around this by using git pull
.
2Note that git fetch
may use the current branch to determine the remote to use, if you have more than one remote defined.
What to know about creating a new branch
In Git, a branch name—a local name like master
or develop
or feature/short
or bug/tall
or whatever—just holds the hash ID of one (1) commit.
To create a new branch, you pick one existing commit in your repository—any commit will do—and tell Git: make a new name that holds this one commit's hash ID. You can do this with git checkout -b
, or with git branch
:
git branch name commit-hash
will create the new name name
such that it contains commit-hash
. If you omit commit-hash
, it defaults to HEAD
.
git checkout -b name commit-hash
will, as a single transaction, both create the new name, and git checkout
the name (and therefore the given commit, by its hash ID). As with git branch
, the default commit hash is that obtained by resolving the special name HEAD
. (Transaction here means that if either of these two parts fail, the command as a whole does nothing: the new name is not created and you have not changed which branch and/or commit you have checked out.)
There are a number of other ways to create a local branch name—Git is a big toolbox, with a lot of tools, some of which do too many things. (This is why Git 2.23 introduced git switch
and git restore
: these break out parts of what git checkout
can do into two separate commands, which offers some hope of being less confusing.) But let's just start with these two. Note that they share the same idea: you pick out some commit—maybe by hash ID, or maybe just the commit you have checked out right now—and create a new name that holds that one particular hash ID.
How branches work, in Git
The key to most of this is that Git isn't really about branches. It's really about commits. Branches—or more precisely, branch names—are just a way to find commits.
Each commit, in Git, is a permanent—or mostly permanent3—and read-only (100% totally read-only) unit that:
- holds a snapshot of all of your files (Git calls this a tree sometimes),
- along with some metadata: information about the commit, such as who made it, when.
This collection of data (tree) and metadata has a unique hash ID. That hash ID is a big ugly string of letters and digits,4 which looks totally random and is impossible for humans to remember. But we don't need to remember them: that's what our computers are for, after all.
This is where the branch names come in. A branch name remembers, for us and for Git, the hash ID of the last commit that is to be considered part of the branch. But what about the earlier commits? This is where one of Git's clever tricks comes in. Every commit remembers the hash ID(s) of its immediate earlier commit(s). Most commits just remember one hash ID. This one hash ID is the parent of the commit, i.e., the commit that comes before this one.
What this means is that, given the last commit in a chain, as found by a branch name, Git can find the earlier commits in that chain. If we let single uppercase letters stand in for hash IDs—though we'll run out after just a few commits—we can draw that like this:
... <-F <-G <-H <--master
Here, the name master
holds some hash ID H
. Git uses that to find commit H
. Commit H
itself holds, in its metadata, the hash ID of commit G
. Git uses that to find commit G
, which holds the hash ID of commit F
. This repeats until Git has worked its way to the very first commit ever, which—since it can't point to any earlier commit—doesn't point to any earlier commit. Here the follow-the-arrows-backwards action stops.
So, given the hash ID of the last commit in a chain—such as that from a branch name—Git can find all the commits that are contained within that branch. To add a commit to a branch, we have Git check out the last commit, via the branch name, and remember which branch name we are using:
...--F--G--H <-- master (HEAD), develop
Then we create a new commit as usual (edit files, git add
, and git commit
). Git packages up a new snapshot, adds the metadata, sets the new commit's parent to be commit H
, and writes out the new commit. This generates a new, unique, big ugly hash ID I
, which we can draw like this:
I
/
...--F--G--H <-- master (HEAD), develop
and now the second sneaky Git trick happens: Git writes the new hash ID into the branch name to which the special name HEAD
is attached. The result is:
I <-- master (HEAD)
/
...--F--G--H <-- develop
Note that commits up through H
are now on both branches, while commit I
is only on master
. Make another new commit J
and master
now points to J
, which points back to I
, which points back to H
. The name develop
is not changed: it still points to existing commit H
.
If we now git checkout develop
, Git extracts commit H
again to work with, and attaches the special name HEAD
to develop
:
I--J <-- master
/
...--F--G--H <-- develop (HEAD)
If we now make some new commits K
and L
, these update the name develop
this time:
I--J <-- master
/
...--F--G--H
\
K--L <-- develop (HEAD)
and here we have branches in the sense that many people mean them: diverging development. But each branch name just identifies one commit. The branches we mean here are the graph fragments: commits ...-F-G-H-I-J
and ...-F-G-H-K-L
.
It's often important, in Git, to be able to stop traveling backwards. We might want to list commits that are on master
that aren't on develop
, and/or vice versa, so that we can see I-J
in isolation and K-L
in isolation. But it's equally important to be able to see the graph and to see that these two branches—or subsets of the graph—diverge at commit H
. (When working backwards, as Git does, they con-verge at commit H
.)
3The permanence, or lack thereof, of any Git commit depends on its reachability. See Think Like (a) Git for much more about reachability.
4Technically, it's a hash of the content of the commit. Git currently uses SHA-1 but the Git folks intend to transition to SHA-256 in the future, which is going to be very interesting.
Merging, not-really-merges, and merge conflicts
Git's merging is a bit complicated and I won't cover all of it here, but let's illustrate two different cases. Suppose we have this graph fragment:
...--o--o--o <-- branch (HEAD)
\
o--o <-- origin/branch
where each round o
represents a commit. You can have Git fast-forward the name branch
so that it points to the same commit as the name origin/branch
. That is, there's no problem in making the commit that origin/branch
names become the tip commit of the branch-name branch
, like this:
...--o--o--o
\
o--o <-- branch (HEAD), origin/branch
No commits are lost in this process, because Git now starts at the last commit and works backwards and still visits (and sees, and has as history) every commit in the chain.
When we have a more complicated graph, though, like this one:
I--J <-- master
/
...--F--G--H
\
K--L <-- develop
and we want to git checkout master
, so as to use commit J
to start with, and then run git merge develop
so as to combine work, now Git can't merely shuffle some branch names around. Now, Git really has to combine work.
What Git does in this case is that it first locates the best shared commit, by starting at both branch tips and working backwards. It's obvious that in this case the best shared commit is commit H
. This best-shared-commit is the merge base of the merge operation.
Next, Git runs two git diff
commands internally.5 First, it compares the snapshot in commit H
to that in the current commit J
:
git diff --find-renames <hash-of-H> <hash-of-J> # what we changed
Then it runs a second git diff
to see what changed between H
and L
:
git diff --find-renames <hash-of-H> <hash-of-L> # what they changed
Now Git combines these two sets-of-changes. The result is one bigger set of changes that Git should make to the snapshot in H
.
A merge conflict occurs whenever one of the changes we made (H
-vs-J
) are in the same file and same lines as one of the changes they made (H
-vs-L
).
For all other cases—we touched a file they didn't, or vice versa, or we touched a line they didn't or vice versa—Git can just combine the changes on its own. For these particular cases, Git can only combine the changes on its own if we made exactly the same change. In this case, Git just takes one copy of the change, rather than both copies.
Whenever you get a merge conflict, it means you told Git to do a git merge
(or git rebase
or other command that uses the merge engine6), and you had a sufficiently complicated case that (a) Git couldn't fake it with a fast-forward, and (b) Git's merge engine was unable to combine these changes on its own.
If there are no merge conflicts, git merge
goes on to make a merge commit, which is a commit that links back to both the previous HEAD
, and the other commit, like this:
I--J
/ \
...--F--G--H M <-- master (HEAD)
\ /
K--L <-- develop
If there are conflicts, Git leaves you with a mess: you must fix the conflicts yourself, instruct Git on how to make the final snapshot for commit M
, and then use git merge --continue
(or git commit
, in ancient versions of Git) to finish the merge manually.
5Internally, it doesn't have to actually run git diff
in a lot of cases, but when things get to the hardest cases, that's more or less what's going on. The --find-renames
option in particular requires a treewide view of all file names in the two commits being compared.
6Rebase is effectively, mostly, a series of git cherry-pick
operations, and those are actually a form of merge—at least in terms of combining changes—so these, too, can have conflicts. In fact, since a git rebase
potentially copies many commits, you can get merge conflicts over and over again, whereas git merge
will only have them happen once.
When creating new branches for new features, you rarely want any of this
Typically, when you are going to make a new feature branch, you don't want any merging at all. You might want to start with git fetch
. As we noted earlier, this has your Git call up some other Git, such as the one at origin
, and get from it any new commits they have that you don't. Your Git then creates or updates your remote-tracking names, such as origin/master
, to identify the same last commit they—the other Git—have as their branch-tips.
Suppose, then, that in your repository, locally, you have:
...--G--H <-- master (HEAD), develop, feature-X
You run git fetch
or git fetch origin
, which has your Git call up the Git at origin
, and get new commits:
I--J <-- origin/master
/
...--G--H <-- master (HEAD), develop, feature-X
\
K <-- origin/feature-X
You want to start on a new feature-Y
. You may want to begin with the latest commit on origin/master
—commit J
—or maybe with the latest commit on origin/feature-X
if feature-Y
depends on feature-X
.
What you should do now is pick the commit you want. That's not one found by any of your branch names.
You can, if you choose, git checkout feature-X
and git merge origin/feature-X
to fast-forward your own feature-X
to origin/feature-X
:
I--J <-- origin/master
/
...--G--H <-- master, develop
\
K <-- feature-X (HEAD), origin/feature-X
Now both your name feature-X
and their name feature-X
(your origin/feature-X
) identify the same commit, K
. But it may well make more sense to delete your name feature-X
entirely. Just let their feature-X
, your origin/feature-X
, steer you around. You're not planning to work on feature-X
at all, but rather on feature-Y
.7
So, at this point you can do:
git checkout -b feature-Y origin/master
if you want to start with commit I
, or:
git checkout -b feature-Y origin/feature-X
if you want to start out with commit K
. The -b
option will make git checkout
attach HEAD
to the new branch name, which will start at the chosen commit.
7Of course, if you are planning on working on both, then go ahead and keep / update your feature-X
.
Branches have upstreams
There's an annoying but useful quirk here. Every branch name in Git is allowed (but not required) to have one upstream set. The upstream name of a branch provides some nice features. Git will automatically set the upstream of a new branch based on what you use as the starting-point commit:
- If you use a raw hash ID, the upstream of your new branch is unset.
- If you use a local branch name, by default, the upstream of your new branch is unset.
- If you use a remote-tracking name like
origin/master
or origin/feature-X
, by default, the upstream of your new branch is this remote-tracking name.
So:
git checkout -b feature-Y origin/feature-X
sets the upstream of new branch name feature-Y
to be origin/feature-X
. This is almost certainly not what you want.
Since you just created feature-Y
locally, though, there is presumably no origin/feature-Y
yet. This means Git won't let you change the upstream of this new branch to origin/feature-Y
... yet. So now it may well be time to run:
git push -u origin feature-Y
This command has your Git call up their Git, see if you have any new commits for feature-Y
that their Git doesn't have,8 and then ask them to set their branch name feature-Y
to identify the same commit as your branch name feature-Y
. This will create feature-Y
in the Git over at origin
.
The success at creating feature-Y
on origin
will tell your Git that your Git should now remember that their feature-Y
points to this same commit. So now your Git grows a new remote-tracking name, origin/feature-Y
. You can now set the upstream of branch feature-Y
to be origin/feature-Y
.
The -u
option of git push
tells git push
that, after successfully creating feature-Y
on origin, and therefore origin/feature-Y
locally, your Git should set the upstream of your branch feature-Y
this way. So this git push -u
operation achieves three things at once, all of which you wanted.
If your git checkout -b
command uses a raw hash ID—such as that obtained by running git rev-parse
on origin/whatever
—your initial branch-checkout won't have any upstream set, which is OK. You can then defer the git push -u
step until you have new commits to send, as long as it's OK to not create the feature-Y
branch name yet, in the other Git.
If your git checkout -b
has set an upstream, and you want to defer creating the name in the other Git, you can use:
git branch --unset-upstream
to just unset the current upstream. Or, you can modify the git checkout -b
to add --no-track
, which tells git checkout -b
not to set an upstream when creating a local branch from a remote-tracking name.
8Of course, you don't (yet) have any commits they don't have: your feature-Y
points to a commit you just got from them, or that you both already shared.