Merging working branch with master and resolving merge conflicts manually

Question

So, I'm very new to Git and would appreciate some help on a practical as well as on a conceptional level.

What I want to do

I branched out to a new branch working_branch where I only worked at a single file and made significant changes there. Now I'd like to basically replace filename on the master branch with the version on working_branch.

What I tried

I tried git rebase master while working__branch was check outed. But instead of replacing filename I ended up with a merge conflict and the request to resolve the conflicts manually.

CONFLICT (content): Merge conflict in filename.R
error: could not apply a71b238... adjust nonull working_branch to main
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply a71b238... adjust nonull working_branch to main

However, I don't know how to do this safely. I don't understand why Git does not understand my intentions and it feels like my best option was to just copy my old code to the new branch and push it there. However, that defeats the purpose of git. I tried to force my luck with git rebase --continue, but in the vim editor I didn't understand where it got the old commit message from and what I was supposed to do.

As it was neither, what I expected, what should happen, nor did I know what I exactly do in the Vim editor, I decided to abort the rebase with git rebase --abort.

I tried a second time and deleted the paragraph that contained "nonull" in order to resolve the conflict manually. I used git add filename.R and git rebase --continue believing it would push my newest version to master but got confused again by what was happening and aborted again.

I tried a third time and deleted all conflicts, used git add filename.R and git rebase --continue after every deletion and ended the commit messages in the vim editor just with :wq. After my last git rebase --continue I was awarded with the following message:

[detached HEAD d070c97] finalize script
 1 file changed, 18 insertions(+), 35 deletions(-)
Successfully rebased and updated refs/heads/working_branch.

However, HEAD is still pointing at working_branch and there are no changes on the master branch. I had expected that after such a message both branches would have been merged and that the files would be the same. Instead, git status now returns the following:

Your branch and 'origin/`working_branch' have diverged,
and have 14 and 11 different commits each, respectively.
  (use "git pull" to merge the remote branch into yours)

What happened? What did I do wrong? What should I do?

Note that there were made changes to the master branch while I was working on my working_branch

Your rebase is the culprit in the "14 and 11 different commits" thing here. Unless you know what you're doing, you should probably not *start* with rebase-oriented workflows; rebase is overly powerful and it's too easy to make a mess. But in any case, never just *delete* merge conflicts: learn how merge works, and what to do when it fails. (Note that rebase is fundamentally repeated cherry-picking, and each cherry-pick is a merge, so if you have 3 commits to rebase, you're doing three merges. That's why you *must* learn merge first!) — torek, Feb 15 '22 at 23:25

score 1 · Answer 1 · answered Feb 16 '22 at 05:44

What you really need here is a good tutorial or book and a week or more of time. Without these, though, here's an overly-rapid launch into what to know about Git and how to use git merge.

What to know when getting started with Git

Git is really all about commits. It's not about files, though commits do contain files, and you will care a lot about your files. It's not about branches either, though branch names help you (and Git) find commits. To Git, though, almost everything is about the commits. This means you need to know what a commit is and does for you.

Each Git commit is numbered, but the numbers are weird, and at least mildly poisonous to human brains. If and when you need to use them, you'll generally want to use cut-and-paste with your mouse or some such. Still, remember that Git is using these numbers. That's how Git finds the commits: by their number. Each commit has a globally unique hash ID: a GUID (Globally Unique ID) or UUID (Universally Unique ID), whatever you'd like to call it. This number is unique to that particular commit, such that every time you make a new commit, it gets a new number that nobody else, anywhere, ever, is allowed to use.¹ That means that two different pieces of Git software, working with two different repositories, can immediately tell if they have the same commit just by comparing the number. If the numbers match, the commits are identical. If not, they're different.

This means no commit can ever change, either: not one bit. So everything inside a commit lasts forever, or at least, as long as the commit itself continues to exist. But what's in a commit?

A commit contains a full snapshot of all of your files. More precisely, it has a full snapshot of the files that Git knew about at the time you (or whoever) made the commit. These files are stored in a special, compressed, read-only, Git-only, and de-duplicated format, which keeps the repository from getting tremendously fat even though every commit stores every file: if some file is a duplicate, it's only stored once, and if an entire commit is nothing but duplicates—this can happen in various ways—the files are stored in zero bytes of storage (though the commit itself still needs a few bytes).
A commit also contains some metadata, or information about the commit itself. This includes the name and email address of whoever made the commit (actually two such). It includes a date-and-time stamp (actually two again). It includes a log message, where you get to explain to your future self why you made the commit. (Note that it's often helpful to go back and write these again later, which is one place rebase comes in. See also this XKCD.)

Now, inside the metadata, Git adds something that Git needs: the raw hash ID—the GUID or UUID—of a list of earlier commits. Most commits store exactly one such hash ID. This results in a simple backwards chain of commits, where each commit holds the hash ID of the commit that comes before it. We say that these later commits point to the earlier ones, and if we use single uppercase letters to stand in for raw hash IDs, and call the most recent commit we just made "commit H", we get a picture like this one:

... <-F <-G <-H

Commit H contains a full snapshot of all the files Git knew about, plus the metadata that we made the commit whenever we made it, and so on. Commit H's metadata makes commit H point backwards to earlier commit G, which also contains a snapshot and metadata.

Git can now extract all the files from both G and H and compare those files. For those that are the same, Git can say nothing at all (and the de-duplication makes it really easy for Git to tell which files those are). For the files that are different in G vs H, Git can compare the contents and work out what changed, as a sort of Spot the Difference game, and then show us this difference only, rather than having to show us two whole versions of the file.

This trick of showing a diff lets git log -p show us commit H by:

printing its raw hash ID;
showing its metadata, to say we made it yesterday or whenever; and
showing what changed in H, even though it's a full snapshot.

Then git log can step back one hop to commit G. Since G is a commit, it has a snapshot and metadata and it points backwards to earlier commit F, so git log can show us commit G the same way, by "diff"-ing the snapshots in F and G. And then git log will move back one hop to commit F, which is a commit and therefore has a snapshot and metadata, and so on.

What this shows is that the commits alone get us most of the way. But to get started, git log had to know to start with commit H. How will Git do that? We could memorize the hash IDs, and type them in for Git, but that's a bad idea. We could save them in files: that's a better idea, but still not great. How about: we could have Git save them for us?

¹This is technically impossible—provably so; see the pigeonhole principle. The hash ID is large enough that we have reason to hope that failure won't happen in any of our lifetimes.

Branch names store hash IDs

This is what branch names are, in Git: they are just a way to store a hash ID. Git stores only the hash ID of the latest commit, e.g., H, in the name. As with the commits themselves, we say that the branch name points to a commit, and we can draw that in now:

...--F--G--H   <-- main

I've gotten lazy about drawing in the arrows from commit-to-commit. That's in part because they literally can't change: like the files inside each commit, the metadata is frozen for all time. Commit H points backwards to G, and will do so forever, or at least as long as commit H exists somewhere in some repository.

The names, though, do change. The name main currently holds H. Someday it might hold a different hash ID. We can also create and destroy branch names whenever we like, so we can add a new name now, such as dev for development:

...--F--G--H   <-- dev, main

We now need a way to remember which name we're working with, because Git normally has us work with one branch name at a time. We'll run git checkout dev or git switch dev to pick the name dev to work with, and to remember that in our drawings, let's attach the special name HEAD to one of the two branch names. We start out on main, like this:

...--F--G--H   <-- dev, main (HEAD)

We're currently on branch main, as git status will say. That means we're using commit H; we'll come back to this in a moment. Then we run git switch dev or git checkout dev. There's no real difference between these, except that git switch was new in Git 2.23. It doesn't do as much as the heavily overloaded git checkout, so it's better because it's less confusing (this is the "less is more" philosophy, which is inaccurate: less is less, it's just that sometimes, less is also better for humans). The result is:

...--F--G--H   <-- dev (HEAD), main

We're still using commit H. We're just doing that now through the name dev.

Git's index and your working tree

The files in a commit are read-only, and in fact, only Git can read them. (Depending on how compressed they are—Git has two different ways of compressing files currently—some programs could read one form pretty easily, but in general, most of the programs on your computer probably can't.) This makes them useless for getting any actual work done, because you need files that your programs can read and write. So before we begin working with commit H, Git has to extract the files.

The extracted files from H go into a work area, which Git calls your working tree or work-tree. It's important to realize that these files are not in Git at all. They came out of Git, to be sure, and they may go back into Git later, but right now they're just ordinary files, not Git files. You can now do anything you want with them. You can get work done!

Now, the really tricky bit here is that when Git extracted all the files from the commit into your working tree so that you could work on them, Git also extracted the files into what Git calls the index, or the staging area, or—rarely these days—the cache. These are three names for the same thing, and what it holds is, in short, your proposed next commit. Git keeps the files for the next commit in the compressed and de-duplicated form, and the index keeps track of those. The files that are in the index are called tracked.

If and when you do edit some file in your working tree, you will eventually have to run git add on it. The reason for this is simple: the copy that's in your working tree isn't in Git at all, and for the next commit, Git needs a copy that is in Git, and is compressed and de-duplicated. Running git add file tells Git: Read the working tree copy of file. Compress that file down into the internal format that you use for commits, and see if it's a duplicate. Prepare it for the next commit. This replaces the copy that's in Git's index.

What's in Git's index, then, are copies (but pre-de-duplicated) of the files that will go into the next commit. That's why I said just now that the index holds your proposed next commit. The key difference between the files in the current commit and the files in the index are that you can change out the files in the index. You can even add all-new files—git add of a file that's not yet in the index puts it there—or remove existing files: git rm file removes a file from both Git's index and your working tree, and now it won't be in the next commit.

When you run git status, Git runs two separate comparisons:

First, Git compares the current commit, as found by the branch name to which HEAD is attached, to what's in Git's index. For all the files that are the same, Git says nothing at all. For any file that is different, Git says that this file is staged for commit. That's where the name staging area comes from.
Then, having listed out any different staged-for-commit files, Git now compares what's in its index / staging-area to the files in your working tree. For files that are the same, Git says nothing at all, again. For files that are different, Git says that they are not staged for commit: you can and should run git add to copy the working tree copy into the index if you want.

Because you can create any file you like at any time in your working tree, you may have working tree files that are not in Git's index. Normally, Git will now complain about these files, calling them untracked. To shut up these complaints, you can list these file names, or patterns like *.o or *.pyc, in a .gitignore file or equivalent. This doesn't actually make the files stay un-committed: it just shuts up the git status complaint here. The files are untracked because they're not in Git's index. Since the index holds the proposed next commit, they won't be in the next commit, unless you add them.

If you do try, explicitly, to add an untracked-and-ignored file, Git will warn you that it didn't do that because you said to ignore it. To force Git to add such a file, you can use git add --force. That will override the untracked-and-ignored status, and copy the file into Git's index. Once it's in Git's index, git add will be happy to update it from the working tree copy, regardless of anything in any .gitignore. So .gitignore doesn't mean ignore, but rather don't complain (with git status) and don't add if not there (with git add). This also handles any en-masse "add all" operations like git add . or git add --all: files that are untracked-and-ignored are silently omitted here.

Making a new commit

Once you've updated your working-tree files, run git status, and run git add to get all the updates into Git's index so that all your important changes or new files or deleted files show up as "staged for commit", you simply run git commit. Git will now:

collect a log message from you, to put in the metadata;
collect the other metadata it needs: your name and email address, for instance, and the current date-and-time from the computer's clock;
use HEAD and the current branch name to find the current commit hash ID;
turn all the ready-to-go files in Git's index into a new snapshot; and
write out a new commit with this metadata and snapshot.

The new commit—let's call this commit I—has a new, unique, never-used-before, never-to-be-used-again hash ID. It has, as its parent, the current commit, which is commit H because we had:

...--G--H   <-- dev (HEAD), main

when we ran git commit. We now have:

          I   <-- dev (HEAD)
         /
...--G--H   <-- main

and this is because the very last step of git commit is to write the new commit's hash ID into the current branch name. Since HEAD is attached to dev, not main, it's dev, not main, that now points to new commit I. So now our branch names, which used to both point to the same commit, point to two different commits. New commit I is only on branch dev, not on branch main.

If you make several more commits—as in your case—they get more new hash IDs. I'm going to draw two instead of three here just to make my drawings prettier, but overall everything works out the same here:

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- main

Clones, remotes, and remote-tracking names

The above is all about working locally on a Git repository, e.g., on your laptop. But Git is not just a version control system (VCS): it's a distributed version control system, or DVCS. There are multiple copies of each repository, on multiple computers. This "D" part of the DVCS means that other people, on these other computers, can be doing other work on other copies of the repository. You make a copy of some Git repository—e.g., one that you and they keep on GitHub, for instance—and they make copies too, and all of your do your work in your own VCS (usually Git) and eventually send your work to each other, or back to GitHub.

The way Git handles the Distributed part means that you don't have to have a central site like GitHub, but having such a site makes a lot of people more comfortable and has certain benefits. So we'll look at things with a GitHub-centric eye here. I'm also going to call your computer your "laptop", even if it's a desktop or deskside computer, just for easier reference.

You and your co-workers / colleagues view the GitHub copy as the "source of truth": what's in that repository is for real. So you start by cloning the central repository:

git clone ssh://git@github.com/org/repo.git

for instance (perhaps you prefer https:// URLs). This clone operation makes a new, initially-totally empty repository on your laptop: such a repository has no commits and no branches. But your Git software then immediately obtains, from the GitHub Git software reading the central repo—let's call this "their Git"—all of their branch names (and any other names that matter, such as tag names) and the commit hash IDs that go with these. Your Git software is now ready to copy stuff into your Git repository.

Your Git software, running on your repo—let's call this "your Git"—starts by saving the URL under the name origin. (You can choose some other name when you run git clone, but normally nobody does that.) Then your Git asks their Git to send over those commits, by hash ID, and their parent commits, by hash ID, and the parent's parents, and so on, until their Git will end up sending every commit. Your Git saves these commits away under these same hash IDs: they are, after all, the same commits, so they get the same hash IDs.

When they're done sending over all their commits, your Git takes all their branch names and changes them. Your Git sticks origin/ in front of each name: their main becomes your origin/main, their dev (if they have one) becomes your origin/dev, their feature/short becomes your origin/feature/short, their feature/tall becomes your origin/feature/tall, and so on. Whatever they have, your Git sticks origin/ in front, because that's the name of the remote. Your Git is turning their branch names into your remote-tracking names.

In the end, your Git has copied all of their commits, but replaced all their branch names with your own remote-tracking names. It's easy to convert between branch name and remote-tracking name, because we just add or remove origin/. The point of all this funny business, though, is this: Just before your git clone finishes, your Git creates one branch in your repository. The one branch your Git creates is the name you select with the -b option when you run git clone. If you don't select a name—and usually people don't—your Git asks their Git which name they recommend, and usually, they recommend main (in modern usage) or sometimes master (left over from a year or two ago, and still the default on many systems). You have an origin/main or origin/master because they have main or master, and your Git thus creates your main or master from their main or master, which in your Git, is origin/main or origin/master.

So, what we've been drawing like this:

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- main

really looks like this:

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- main, origin/main

(assuming they have only branch main: if they have more branches, there are more origin/ names in your Git).

Now, since the time you made your clone, someone else made a clone and made new commits in their clone and then sent those new commits back to the central GitHub repository. So you had to pick up these new commits. You do this with git fetch. Some people run git pull which does run git fetch, but if you're new to Git, I advise starting with your own git fetch yourself: don't start using git pull until you've learned how to fetch and then either merge or rebase. When you run git fetch—either literally, or indirectly via git pull—your Git calls up the GitHub Git software and connects to the central repo again.

As before, your Git has their Git list out their branch names and hash IDs. This time, though, their main points to some new commit that you don't have. Your Git asks their Git for that hash ID, and that commit's parent(s), and so on, until your Git gets to a hash ID that you do already have. Their Git then packages up and sends over just the new-to-you commits, which your Git adds; and finally, your Git updates your remote-tracking names according to their branch names. So now you have, in your repository, this:²

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- main
         \
          K--L   <-- origin/main

Since you and they have both done work in parallel, you now need to combine the work. This is a job for git merge.

²For posting reasons I'm using just two commits on each side. I'm also using different branch names as I think it's less confusing. Here's a drawing that is closer to your actual situation:

          I--J--K   <-- working_branch
         /
...--G--H
         \
          L--M--N--O--P--...--W   <-- origin/working_branch

Merging works the same way regardless of the number of commits, though, as long as there's at least one on each "side".

Merging

Merging is, as we just said, about combining work.

We know that every commit has a full snapshot of every file, and that if we move along from parent commit to child commit—e.g., from H to I, and then from I to J—we'll see what changed in that commit. But what if we just compare the snapshot in H directly to the snapshot in J? Will that work? What will we get?

It's worth thinking about this for a while, and working through some examples, but in fact it works just fine: we get a summarized recipe from Git that, if applied to the snapshot in H, produces the snapshot in J. That is, the diff output, from:

git diff --find-renames <hash-of-H> <hash-of-J>

will tell us which files need changes, and what the final changes are, to get from H to J, without having to go through the intermediate I version. This works no matter how many commits there are in between.³ So a quick diff from H to J (or in footnote 2, from H to L), will show what you did on your branch. That is, such a change, applied to H, will make—or keep—all of your changes.

The same principle applies with their changes: a diff directly from H to L (or in footnote 2, from H to W) finds a shortcut recipe that will make, or keep, all of their changes.

This is just what git merge does. We run git merge origin/main while we're on dev, using commit J, and Git finds commit L—because origin/main points to L—and then works its way backwards to find the best shared commit, one that is on both branches. That's commit H here: it's on both dev and origin/main, and it's the best one because going further back doesn't help any, but going forward means we don't keep both sets of changes correctly.

So, Git runs the two git diff commands, which gets a list of changes from "both sides" or "both branches". Git can then combine the list of changes:

If we touched some file, and they didn't, Git keeps all of our changes.
If they touched some file, and we didn't, Git keeps all of their changes.
If we and they touched the same file, Git has to work harder: it has to figure out which lines we might both have touched, if any. If any changes overlap,⁴ we will in general see what Git calls a merge conflict. The one exception here is that if we and they both make the exact same changes to the exact same lines, Git will just take one copy of the changes.

In any case, Git then tries to apply the combined change to the file from the merge base (H). Quite often, Git can do the entire merge on its own, with no merge conflicts. If that's the case, Git will go on and make a new commit on its own, which we can draw like this:

          I--J
         /    \
...--G--H      M   <-- dev (HEAD)
         \    /
          K--L   <-- origin/main

I dropped the name main from the drawing for space reasons; it's still there, still pointing to commit H, for this merge, but it's too hard to draw in as plain-text. The new commit M, however, has gone on branch dev, the way new commits always do: HEAD is attached to dev, so dev now points to the new commit.

Commit M points back to commit J, just like every commit points to its parent. What makes commit M special, though, is that it also points back to commit L: the commit we named when we ran git merge origin/main. That tells Git that commit M is a merge, and it brings commits K-L onto branch dev. That is, before the merge, branch dev meant commits up through and including J,⁵ but not K or L. But after the merge, every commit including K and L is on dev.

In other words, by having two parents, commit M introduces more commits to the branch. That's what a merge commit does: it has a single snapshot as usual, and it has metadata as usual, but it has more than one parent so it makes more commits find-able just from the one branch name.

Sometimes, though, you get merge conflicts. In this case, git merge stops in the middle, leaving the merge half-done. Your job, as a programmer, is now to finish the merge.

³If you rename files, a step-by-step comparison going from one commit to the next will sometimes work better, given some of the other things that Git does and does not do. It would be nice if there were a way to make git merge do this step-by-step thing. There isn't, though.

⁴The test Git actually uses here is "overlap or abut": if we modify lines 10 through 13 inclusive, for instance, and they modify lines 14–16, our changes "touch at the edge", i.e., abut, and Git declares a merge conflict. The only reason given for "why" is that experience with tens of thousands of merges shows that this is better than not doing so.

⁵Note that commits up through and including H are on all three branches, main, dev, and origin/main. That is, they're on branch origin/main if origin/main is a branch. Is it? That depends on who, or how, you ask.

Handling merge conflicts

When Git stops in the middle of a merge, it generally leaves a mess behind. You have to fix this mess. There are two components to the mess:

Git leaves stuff in the index that tells Git don't commit, the merge is unfinished. This stuff is useful for finishing the merge.
Git leaves, in your working tree files, its best effort at doing the merge. For each conflicted file, there may be conflict markers. I say may be because there are high-level conflicts that we won't cover here. The low-level conflicts do leave conflict markers in the files.

To fix these, you can:

open the conflicted files in any editor you like, and resolve the conflicts manually and write the resolved file back to the working tree, or
use git mergetool to run any merge tool you like.

The git mergetool command uses the extra information that's in the index to locate the conflicted files, and to find the three input files: merge base, "ours" or "LOCAL", and "theirs" or "REMOTE". It then runs the merge tool you choose—Git has no built-in merge tools of its own, but there are a number of free ones you can install, or your OS may provide some—and the merge tool's job is to write the resolved file back to the working tree.

Either way, then, the resolved file ends up in the working tree, as an ordinary file. You can then run git add on it, or—if you use git mergetool—Git will automatically run git add on it. This git add cleans up the index, marking the file as resolved. Git believes the conflicts are resolved, and the working tree file contains the right merge result, regardless of what you did with the working tree file. If you didn't update the working tree file, git mergetool may ask you whether it should run git add: don't do it because the file isn't merged. If you have not merged the file, don't git add the marked-up-with-conflicts file!

If you know that the merge result should be your version of the file, regardless of any changes they made, there is a shortcut way to do that (git checkout --ours or git restore --ours), but be very sure that this is correct before you do it. Look carefully at their changes: run git diff by hand if you need to, to see what they did, before just discarding their changes with --ours here.

In any case, once all the conflicts are resolved and git add-ed, you should run:

git merge --continue

or simply:

git commit

(both do the same thing, committing the merge result). That makes the merge commit M, just as Git would have done on its own if there had not been a conflict before.

If you decide you want to give up on merging for now, you can use:

git merge --abort

to stop the merge and go back to the state you had before you started the git merge command at all.