Git branch creation is giving rise to merge conflict

Question

I have a shell script where I want to create a branch locally, commit some change and push it to upstream. Later I need to create a pull request in the TFS. Here goes the script:

# enter the directory containing the repo
echo "....Entering $dest"
cd ~/IdeaProjects/$dest

# get the latest development branch
echo "....git pull"
git pull

# checkout development so that we can create a branch from the development branch
echo "....git checkout development"
git checkout development

# now create a branch
echo "....git checkout -b $branch"
git checkout -b $branch

# push to upstream to sync with it
echo "....git push origin"
git push origin $branch

# make some changes
# ...

# add changed / added files
echo "....git add"
git add .

# now commit the changes
echo "....git commit"
git commit -m "modified domain object $fileList"

# push to upstream
echo "....git push"
git push --set-upstream origin $branch

So far so good. Only problem is that sometimes there are merge conflicts with development when creating pull requests. Sometimes it even happens right away (ie right after executing the script) and it is certain there is nobody else modifying any part of the codebase of the repo. It happens even when just a brand new class added or just a field added, eg.

class ExistingPojo {
    // fields

    // add this new field
    NewPojo embedded;
}

// an new class added
class NewPojo {
    // fields
}

The problem is that I am having hard time understanding the pattern when it fails. It doesn't make too much sense because if I am creating a branch from the development, then I should have the latest code from master into the new branch; and after modification there should not be any conflict because nobody else is making any changes.

You've mentioned you have the merge conflicts sometimes, what's the difference between two branches? — Cece Dong - MSFT, Dec 30 '19 at 07:38

score 1 · Answer 1 · answered Dec 30 '19 at 07:35

TL;DR

You probably want a script that does something along these lines:

#! /bin/sh

# ... check actual invocation and set some variables:
branch=<something>

git fetch origin || exit

# check actual invocation and/or fetch results and set one
# more variable:
startpoint=<something>   # or origin/something
# you may even want to use `git rev-parse` here to convert
# a name to a raw hash ID:
# startpoint=$(git rev-parse $startpoint) || exit

git checkout -b $branch $startpoint || exit
git push -u origin $branch

where startpoint will depend on where—as in, based on which commit—you want to start working. You probably don't want any git pull steps in here.

Long

If you are thinking too much in terms of branches, that would explain this:

The problem is that I am having hard time understanding the pattern when it fails.

The problem here is that branch doesn't really mean any one thing in Git. Sometimes, when people say a branch, they mean a branch name. Other times, they mean a series of commits, i.e., some part of the commit DAG. (See also What exactly do we mean by "branch"?) The thing is that branch names move over time, which makes relying on them to describe series-of-commits both very useful, and very tricky.

As Hugo Valenza M suggests, I also generally recommend not using git pull. What git pull does is to run two Git commands:

First, it runs git fetch, passing along most of the other arguments you gave to git pull (if any). This step has your Git call up some other Git and obtain new commits—commits the other Git has, that you don't yet, that they offer and your Git decides it should take.

Your Git generally remembers their Git's branch names, in the process, using what Git calls a remote. A remote is mostly just a short name to remember a URL, but having this enables a number of other features. It's very common to have exactly one remote in your Git repository, named origin.

Typically, after your Git calls up the Git at origin, your Git now has its own memory of their branch names: their master is now your origin/master, their develop is now your origin/develop, and so on. These names—the origin/-prefixed ones—are your Git's remote-tracking names. They're subtly, or sometimes blatantly, different from regular (local) branch names.
Then, after the git fetch succeeds, git pull runs a second Git command. The usual default command is git merge and git merge is where you can get merge conflicts. You can direct Git to use git rebase instead; here, you can get merge conflicts. You won't always get merge conflicts! It depends on your commit graph, and the contents of the various commits.

If you run the two commands yourself, you gain much more direct control. Most importantly, you can inspect the result of the git fetch command before you proceed. You will also immediately see that the git merge or git rebase step operates using your current branch.¹ The git fetch step doesn't,² which means you can run git fetch any time, regardless of what prior git checkout you may have used, but the merge or rebase step does, so it matters.

In your particular case, your desired second command is probably neither git merge nor git rebase. So here, you probably want git fetch followed by ... well, we'll get to that in a bit, but you've seen above that it's probably git checkout -b.

¹Well, you will when you fully understand merge and rebase. Since git pull uses them, you must understand them; there is no going around this by using git pull.

²Note that git fetch may use the current branch to determine the remote to use, if you have more than one remote defined.

What to know about creating a new branch

In Git, a branch name—a local name like master or develop or feature/short or bug/tall or whatever—just holds the hash ID of one (1) commit.

To create a new branch, you pick one existing commit in your repository—any commit will do—and tell Git: make a new name that holds this one commit's hash ID. You can do this with git checkout -b, or with git branch:

git branch name commit-hash will create the new name name such that it contains commit-hash. If you omit commit-hash, it defaults to HEAD.
git checkout -b name commit-hash will, as a single transaction, both create the new name, and git checkout the name (and therefore the given commit, by its hash ID). As with git branch, the default commit hash is that obtained by resolving the special name HEAD. (Transaction here means that if either of these two parts fail, the command as a whole does nothing: the new name is not created and you have not changed which branch and/or commit you have checked out.)

There are a number of other ways to create a local branch name—Git is a big toolbox, with a lot of tools, some of which do too many things. (This is why Git 2.23 introduced git switch and git restore: these break out parts of what git checkout can do into two separate commands, which offers some hope of being less confusing.) But let's just start with these two. Note that they share the same idea: you pick out some commit—maybe by hash ID, or maybe just the commit you have checked out right now—and create a new name that holds that one particular hash ID.

How branches work, in Git

The key to most of this is that Git isn't really about branches. It's really about commits. Branches—or more precisely, branch names—are just a way to find commits.

Each commit, in Git, is a permanent—or mostly permanent³—and read-only (100% totally read-only) unit that:

holds a snapshot of all of your files (Git calls this a tree sometimes),
along with some metadata: information about the commit, such as who made it, when.

This collection of data (tree) and metadata has a unique hash ID. That hash ID is a big ugly string of letters and digits,⁴ which looks totally random and is impossible for humans to remember. But we don't need to remember them: that's what our computers are for, after all.

This is where the branch names come in. A branch name remembers, for us and for Git, the hash ID of the last commit that is to be considered part of the branch. But what about the earlier commits? This is where one of Git's clever tricks comes in. Every commit remembers the hash ID(s) of its immediate earlier commit(s). Most commits just remember one hash ID. This one hash ID is the parent of the commit, i.e., the commit that comes before this one.

What this means is that, given the last commit in a chain, as found by a branch name, Git can find the earlier commits in that chain. If we let single uppercase letters stand in for hash IDs—though we'll run out after just a few commits—we can draw that like this:

... <-F <-G <-H   <--master

Here, the name master holds some hash ID H. Git uses that to find commit H. Commit H itself holds, in its metadata, the hash ID of commit G. Git uses that to find commit G, which holds the hash ID of commit F. This repeats until Git has worked its way to the very first commit ever, which—since it can't point to any earlier commit—doesn't point to any earlier commit. Here the follow-the-arrows-backwards action stops.

So, given the hash ID of the last commit in a chain—such as that from a branch name—Git can find all the commits that are contained within that branch. To add a commit to a branch, we have Git check out the last commit, via the branch name, and remember which branch name we are using:

...--F--G--H   <-- master (HEAD), develop

Then we create a new commit as usual (edit files, git add, and git commit). Git packages up a new snapshot, adds the metadata, sets the new commit's parent to be commit H, and writes out the new commit. This generates a new, unique, big ugly hash ID I, which we can draw like this:

             I
            /
...--F--G--H   <-- master (HEAD), develop

and now the second sneaky Git trick happens: Git writes the new hash ID into the branch name to which the special name HEAD is attached. The result is:

             I   <-- master (HEAD)
            /
...--F--G--H   <-- develop

Note that commits up through H are now on both branches, while commit I is only on master. Make another new commit J and master now points to J, which points back to I, which points back to H. The name develop is not changed: it still points to existing commit H.

If we now git checkout develop, Git extracts commit H again to work with, and attaches the special name HEAD to develop:

             I--J   <-- master
            /
...--F--G--H   <-- develop (HEAD)

If we now make some new commits K and L, these update the name develop this time:

             I--J   <-- master
            /
...--F--G--H
            \
             K--L   <-- develop (HEAD)

and here we have branches in the sense that many people mean them: diverging development. But each branch name just identifies one commit. The branches we mean here are the graph fragments: commits ...-F-G-H-I-J and ...-F-G-H-K-L.

It's often important, in Git, to be able to stop traveling backwards. We might want to list commits that are on master that aren't on develop, and/or vice versa, so that we can see I-J in isolation and K-L in isolation. But it's equally important to be able to see the graph and to see that these two branches—or subsets of the graph—diverge at commit H. (When working backwards, as Git does, they con-verge at commit H.)

³The permanence, or lack thereof, of any Git commit depends on its reachability. See Think Like (a) Git for much more about reachability.

⁴Technically, it's a hash of the content of the commit. Git currently uses SHA-1 but the Git folks intend to transition to SHA-256 in the future, which is going to be very interesting.

Merging, not-really-merges, and merge conflicts

Git's merging is a bit complicated and I won't cover all of it here, but let's illustrate two different cases. Suppose we have this graph fragment:

...--o--o--o   <-- branch (HEAD)
            \
             o--o   <-- origin/branch

where each round o represents a commit. You can have Git fast-forward the name branch so that it points to the same commit as the name origin/branch. That is, there's no problem in making the commit that origin/branch names become the tip commit of the branch-name branch, like this:

...--o--o--o
            \
             o--o   <-- branch (HEAD), origin/branch

No commits are lost in this process, because Git now starts at the last commit and works backwards and still visits (and sees, and has as history) every commit in the chain.

When we have a more complicated graph, though, like this one:

             I--J   <-- master
            /
...--F--G--H
            \
             K--L   <-- develop

and we want to git checkout master, so as to use commit J to start with, and then run git merge develop so as to combine work, now Git can't merely shuffle some branch names around. Now, Git really has to combine work.

What Git does in this case is that it first locates the best shared commit, by starting at both branch tips and working backwards. It's obvious that in this case the best shared commit is commit H. This best-shared-commit is the merge base of the merge operation.

Next, Git runs two git diff commands internally.⁵ First, it compares the snapshot in commit H to that in the current commit J:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed

Then it runs a second git diff to see what changed between H and L:

git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

Now Git combines these two sets-of-changes. The result is one bigger set of changes that Git should make to the snapshot in H.

A merge conflict occurs whenever one of the changes we made (H-vs-J) are in the same file and same lines as one of the changes they made (H-vs-L).

For all other cases—we touched a file they didn't, or vice versa, or we touched a line they didn't or vice versa—Git can just combine the changes on its own. For these particular cases, Git can only combine the changes on its own if we made exactly the same change. In this case, Git just takes one copy of the change, rather than both copies.

Whenever you get a merge conflict, it means you told Git to do a git merge (or git rebase or other command that uses the merge engine⁶), and you had a sufficiently complicated case that (a) Git couldn't fake it with a fast-forward, and (b) Git's merge engine was unable to combine these changes on its own.

If there are no merge conflicts, git merge goes on to make a merge commit, which is a commit that links back to both the previous HEAD, and the other commit, like this:

             I--J
            /    \
...--F--G--H      M   <-- master (HEAD)
            \    /
             K--L   <-- develop

If there are conflicts, Git leaves you with a mess: you must fix the conflicts yourself, instruct Git on how to make the final snapshot for commit M, and then use git merge --continue (or git commit, in ancient versions of Git) to finish the merge manually.

⁵Internally, it doesn't have to actually run git diff in a lot of cases, but when things get to the hardest cases, that's more or less what's going on. The --find-renames option in particular requires a treewide view of all file names in the two commits being compared.

⁶Rebase is effectively, mostly, a series of git cherry-pick operations, and those are actually a form of merge—at least in terms of combining changes—so these, too, can have conflicts. In fact, since a git rebase potentially copies many commits, you can get merge conflicts over and over again, whereas git merge will only have them happen once.

When creating new branches for new features, you rarely want any of this

Typically, when you are going to make a new feature branch, you don't want any merging at all. You might want to start with git fetch. As we noted earlier, this has your Git call up some other Git, such as the one at origin, and get from it any new commits they have that you don't. Your Git then creates or updates your remote-tracking names, such as origin/master, to identify the same last commit they—the other Git—have as their branch-tips.

Suppose, then, that in your repository, locally, you have:

...--G--H   <-- master (HEAD), develop, feature-X

You run git fetch or git fetch origin, which has your Git call up the Git at origin, and get new commits:

          I--J   <-- origin/master
         /
...--G--H   <-- master (HEAD), develop, feature-X
         \
          K   <-- origin/feature-X

You want to start on a new feature-Y. You may want to begin with the latest commit on origin/master—commit J—or maybe with the latest commit on origin/feature-X if feature-Y depends on feature-X.

What you should do now is pick the commit you want. That's not one found by any of your branch names.

You can, if you choose, git checkout feature-X and git merge origin/feature-X to fast-forward your own feature-X to origin/feature-X:

          I--J   <-- origin/master
         /
...--G--H   <-- master, develop
         \
          K   <-- feature-X (HEAD), origin/feature-X

Now both your name feature-X and their name feature-X (your origin/feature-X) identify the same commit, K. But it may well make more sense to delete your name feature-X entirely. Just let their feature-X, your origin/feature-X, steer you around. You're not planning to work on feature-X at all, but rather on feature-Y.⁷

So, at this point you can do:

git checkout -b feature-Y origin/master

if you want to start with commit I, or:

git checkout -b feature-Y origin/feature-X

if you want to start out with commit K. The -b option will make git checkout attach HEAD to the new branch name, which will start at the chosen commit.

⁷Of course, if you are planning on working on both, then go ahead and keep / update your feature-X.

Branches have upstreams

There's an annoying but useful quirk here. Every branch name in Git is allowed (but not required) to have one upstream set. The upstream name of a branch provides some nice features. Git will automatically set the upstream of a new branch based on what you use as the starting-point commit:

If you use a raw hash ID, the upstream of your new branch is unset.
If you use a local branch name, by default, the upstream of your new branch is unset.
If you use a remote-tracking name like origin/master or origin/feature-X, by default, the upstream of your new branch is this remote-tracking name.

So:

git checkout -b feature-Y origin/feature-X

sets the upstream of new branch name feature-Y to be origin/feature-X. This is almost certainly not what you want.

Since you just created feature-Y locally, though, there is presumably no origin/feature-Y yet. This means Git won't let you change the upstream of this new branch to origin/feature-Y ... yet. So now it may well be time to run:

git push -u origin feature-Y

This command has your Git call up their Git, see if you have any new commits for feature-Y that their Git doesn't have,⁸ and then ask them to set their branch name feature-Y to identify the same commit as your branch name feature-Y. This will create feature-Y in the Git over at origin.

The success at creating feature-Y on origin will tell your Git that your Git should now remember that their feature-Y points to this same commit. So now your Git grows a new remote-tracking name, origin/feature-Y. You can now set the upstream of branch feature-Y to be origin/feature-Y.

The -u option of git push tells git push that, after successfully creating feature-Y on origin, and therefore origin/feature-Y locally, your Git should set the upstream of your branch feature-Y this way. So this git push -u operation achieves three things at once, all of which you wanted.

If your git checkout -b command uses a raw hash ID—such as that obtained by running git rev-parse on origin/whatever—your initial branch-checkout won't have any upstream set, which is OK. You can then defer the git push -u step until you have new commits to send, as long as it's OK to not create the feature-Y branch name yet, in the other Git.

If your git checkout -b has set an upstream, and you want to defer creating the name in the other Git, you can use:

git branch --unset-upstream

to just unset the current upstream. Or, you can modify the git checkout -b to add --no-track, which tells git checkout -b not to set an upstream when creating a local branch from a remote-tracking name.

⁸Of course, you don't (yet) have any commits they don't have: your feature-Y points to a commit you just got from them, or that you both already shared.

score 0 · Answer 2 · answered Dec 30 '19 at 04:48

0

replace your first git pull with this:

git fetch --all

answered Dec 30 '19 at 04:48

Hugo Valenza M

169
1
6

1

You should never use `git fetch --all`. (Well, hardly ever: feel free to use it if you can define *remote* and know that `--all` means *all remotes*, not *all branches*.) Using `git fetch` without `--all` is quite reasonable, but after a fetch, you'll want to run some more Git commands. That's why `git pull` exists: it runs `git fetch`, then runs a second command. The negative side of `git pull` is that you must choose the second command in advance, before you see what `git fetch` does. – torek Dec 30 '19 at 05:40