Why git pull from 2 branches deletes some modifications from the first branch?

Question

I have this scenario:

branchA - with some modifications for 1.txt file
branchB - with some other modifications for 1.txt file
master - with old version of 1.txt file

I am in the master branch. First I pull branchA and get branchA modifications to 1.txt file. then I pulled branchB and get its modifications to 1.txt file, but that last pull also deleted the branchA modifications, and Git didn't show any conflict, why is that happening and how to prevent it

What exact commands did you run to do these pulls? I believe the standard way to do what you are describing is to merge the branches into master, not pull. — Gec, Jul 13 '22 at 15:10
@Gec I did it like this, I am on branch master: 1. git pull origin branchA. 2. git pull origin branchB. and correct me if I am wrong pull == fetch + merge? — Ben.S, Jul 13 '22 at 22:23

score 1 · Answer 1 · answered Jul 14 '22 at 03:47

TL;DR

We can't tell you without more information. But you can find out why for yourself.

Long

As you say in your comment, git pull means run git fetch, then run a second Git command and the second command defaults to git merge. So:

$ git checkout master
[various possible messages]
$ git pull origin branchA
[various possible messages]
$ git pull origin branchB
[more possible messages]

has your Git run git fetch twice, and git merge twice. You can make this more efficient (on the computer, less efficient in terms of what you have to type in) by using git merge directly:

$ git checkout master
[various possible messages]
$ git fetch origin
[various possible messages]
$ git merge origin/branchA
[various possible messages]
$ git merge origin/branchB
[various possible messages]

and the outcome should be the same, given the same starting point, as long as the repository over at origin is not so super-active that the two different git fetch operations would do wildly different things. This version is also easier to describe, so let's run with the longer but more efficient-for-Git variant.

[After the first merge, I] get branchA modifications to 1.txt file.

Note that the files you see in your working tree are simply extracted from some commit(s). The git merge command performs some action, and its action is a little complex; we'll come back to this in a moment.

then I pulled branchB and get its modifications to 1.txt file, but that last pull also deleted the branchA modifications, and Git didn't show any conflict, why is that happening ...

To explain why, it's necessary to show just how git merge really works—and to get there, we must start with the basic building block of a Git repository, the commit.

Git is not really about files. It's true that commits contain files, but a commit is a sort of all-or-nothing deal: you either have the commit, which means you have all those files in exactly that form, or you don't have the commit and you don't have its files at all. It's also true that we (humans) find commits using the aid of branch names, and that Git organizes commits into things we call branches.¹ But the raison d'être for Git is really the commit.

¹Humans call the branch names "branches", and call the commit-groups "branches", and sometimes call one particular commit "a branch", and may call remote-tracking names "branches". For that matter, we call those brown things that stick out of tree trunks and eventually hold leaves "branches" too.² So you need to watch out with the word "branch": think about whether you mean one specific commit, some group of commits, a branch name, or what, whenever you say or hear "branch". The person who said or wrote it might mean something else!

²Arborists and tree science people—dendrologists—have specific meanings for "trunk", "branch", "twig", and so on. See the discussion here and/or the information here, for instance.

A commit is ...

Every Git commit has a unique number. This number—expressed in hexadecimal as a hash ID—is how Git actually finds and identifies the commit. Git needs this hash ID to find the commit. If Git didn't give us a better-for-humans alternative, we'd have to memorize all the hash IDs. So of course, Git does give us an alternative. But before we look at it, let's look at what's in a commit.

Each commit consists of two parts:

The commit holds a full snapshot of every file, as a sort of permanent archive. The files in this snapshot are stored in a special Git-only format where they're compressed and, importantly, have their contents de-duplicated. So this means that if your new commit has 999 identical files and one different (edited) file, the new commit literally shares all 999 identical files with the previous commit.

For this to work, committed files can never be changed. As it happens, Git's magic numbering trick for commits already requires that no part of any commit ever be changed, so Git gets this for free.
Meanwhile, separate from the snapshot, every commit holds some metadata, or information about the commit itself. This includes things like the name and email address of the person who made the commit. (The snapshot is actually handled indirectly, via a tree line in this metadata, which allows a new snapshot to completely reuse everything about a previous snapshot. For instance, if you make a bad commit, then use git revert to undo the bad one by adding a correction, the new commit literally uses no space at all to hold the new snapshot, because it's 100% identical to the snapshot from two commits ago. The new commit has new metadata, but re-uses an entire snapshot, rather than just parts of previous ones.)

The reason we care about the snapshot is obvious enough: those are the files that Git will "un-archive" for us when we ask Git to extract that snapshot. The reason we care about the metadata is that it defines what git log shows us: who made the commit, when, and why, i.e., their log message. But the metadata also include a list of previous commit hash IDs, and this is what forms commits into the chains that we call "branches".

These commit chains are crucial to understanding how git merge works. Let's take a brief look at the mechanism.

Most Git commits store exactly one previous-commit hash ID in their metadata. Git calls this one saved hash ID the parent of the commit, and we say that the commit points to its parent. We can draw this, if we use letters to stand in for those big ugly hash IDs, by drawing the latest commit H with an arrow coming out of it, pointing backwards:

<-H

Here H stands for "hash". Because this "arrow" points to the previous commit, it's now time to draw in that previous commit. Let's pick the previous letter G for that:

        <-G <-H

Since G is a commit, it has a snapshot and metadata, and its metadata holds one commit hash ID. Commit G points to some still-earlier commit:

... <-F <-G <-H

This kind of backwards-pointing linkage lets Git work its way from the last commit in this chain, all the way back to the very first commit ever. That commit, being the first one, has an empty list of previous commits, because there isn't any previous commit to get to. So that's where git log has to stop.

Branch names find commits

I noted earlier that Git needs the actual hash ID, whatever that is, for commit H to find commit H in its object database. We could just memorize it, and type it in, but who wants to memorize and retype 30cc8d0f147546d4dd77bf497f4dec51e7265bd8 for instance? So Git gives us—humans—an alternative that is much easier for us. Git's references—branch names, tag names, remote-tracking names, and all other kinds of names that Git lumps under this general term "reference"—let us type in a nice simple name like master or branchA.

The name itself simply holds the hash ID of the last commit in the chain. Whatever hash ID the name holds, that hash ID is the last commit in the chain. So when we have:

...--F--G--H   <-- master

the name master means "commit H", anywhere we use the name where Git needs a hash ID.

We can, at any time, create a new branch name. We just need to make this name point to some existing commit. For instance, if we create a new br1 branch, pointing to H, we get:

...--G--H   <-- br1, master

To remember which branch name we're using, Git attaches the special name HEAD to one of the branch names. We'll skip most of this and move on to remote-tracking names.

`git fetch` and remote-tracking names

When you run git fetch, you are having your Git software, which works with your Git repository (your clone, on your laptop for instance) call up some other Git software, which is working with some other Git repository: a different clone.

That other Git software and repository, working together, have their own branch names. These point to commits they have in their repository.

You may or may not have all the same commits. Every commit is numbered, after all, with a unique number. If you have a commit that has the same number, that means you have the same commit. If you have commits that they don't, you'll have numbers that are just missing from their repository entirely. The converse is true if they have commits you lack.

So, your git fetch has them list out their branch names and other references, and the hash IDs that go with these. Since the branch names define the latest commits, any of their branch names that have a commit that you don't will show up here.

Fpr every commit they tell you about, that they have and you don't, your Git will ask their Git to send that commit to you.³ This also obliges their Git to offer the parent(s) of that commit, which your Git will check to see if you have. If not, they must send the parents, and offer you the parents' parents, and so on. In this way, you get from them all of their new commits that you don't already have.

Each git fetch, in other words, just adds to your collective. You now have their commits in addition to any of your own that you've never given out. (To send commits that you have, that they lack, you use git push. This means push and fetch are as close as Git gets to opposites. They're not exactly opposites, though.)

There's one problem: they have branch names for remembering their latest commits. Your Git software can't just rip your branch names off your commits to make them remember their commits. What if you've add a new master-branch commit that they don't have yet? Then you have:

          I   <-- master (HEAD)
         /
...--G--H

(see how your master now points to your latest commit, which is later than H?). Meanwhile, they have some commit you are just now hearing of, commit J, that their master records:

...--G--H
         \
          J   <-- [their] master

(See how both I and J are both later than H? Their master records the now-shared J commit.)

In your repository, you've now added commit J so you have:

          I   <-- master (HEAD)
         /
...--G--H
         \
          J   <-- [their] master

We'd like to to have our Git software set up some name in our repository to remember where their master pointed during this git fetch. To achieve that, we have our Git turn their branch names into our remote-tracking names:

          I   <-- master (HEAD)
         /
...--G--H
         \
          J   <-- origin/master

This origin-prefixed name, origin/master, is our Git's way of remembering their Git's branch names.

³This includes sending all of its files—but, due to the magic of Git's de-duplication and the way commits work, their Git and your Git can usually very quickly figure out which files you already have, and they can avoid sending you duplicates. So, even though every commit has every file, and you get every new commit, you only get new-to-you files, just as you only get new-to-you commits. Even then, they're magically compressed against files that their Git knows that you have. This magic works poorly for certain cases of special clones, such as shallow clones, but we won't go into details here.

Your own setup (it's different and not something I can predict)

Let's suppose that your master is on commit H, and their master was on commit H before (which is why your own is also on H now):

...--G--H   <-- master (HEAD), origin/master

You now run git fetch origin. They've gone ahead and created some new commits and have some new branch names, so now you get:

          I   <-- origin/branchA
         /
...--G--H   <-- master (HEAD)
      \  \
       \  J   <-- origin/master
        \
         K--L   <-- origin/branchB

Here, things get a little speculative. I can only guess what the graph looks like. You can see it: use git log --graph, or one of the many answers from Pretty Git branch graphs, to see what your graph looks like after git fetch, and before any git merge.

Note that to see this, you must use a separate git fetch, then some git log command or similar, before you run git merge. If you run git pull origin branchA, and then look, it's too late!

Using `git merge`

Merging is about combining work. In general, want Git to combine work we've done since some point, with work someone else has done since some point. But sometimes there's nothing to combine. In these cases, Git can cheat.

Let's go back to our ...-G-H sequence:

...--G--H   <-- master (HEAD)

Let's create two branch names, br1 and br2, now:

...--G--H   <-- br1, br2, master (HEAD)

Now let's git switch br1 and make two new commits:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- br2, master

Then we'll git switch br2 and make two more new commits:

          I--J   <-- br1
         /
...--G--H   <-- master
         \
          K--L   <-- br2 (HEAD)

We now switch back to master, or maybe br1:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- master
         \
          K--L   <-- br2

... and run git merge and give it some other name (e.g., br2). We're asking Git to combine work we did on one branch with work we did on another branch. What does this actually mean?

The first problem with any such request is that commits don't store changes. They only store snapshots. Commit J has a full snapshot of all files, and commit L has a full snapshot of all files. Commit H has a full snapshot of all files, too.

We can't just compare what's in J to what's in L. Suppose we started out at commit H with a file with 100 lines. Suppose that in commits I-J we added a few lines near the top, and in commits K-L we added a few lines near the bottom. Comparing J vs L will tell us delete the added lines in the top, and add instead the lines in the bottom.

But hold on: our word-based description in the last paragraph above has the key clue: Suppose we started out at commit H ... That is, in fact, exactly what we did. But how should Git know this?

Well, take a look at the graph. If we peel the distracting labels off it, we have:

          I--J
         /
...--G--H
         \
          K--L

It's almost blindingly obvious here: we were working along "in common", on the middle line, and then we "branched off" in two different directions, to make the top and bottom lines. Git merely needs to find this shared starting point. From there, it can determine what we changed, on each branch.

But suppose we're "on" master—that is, HEAD is attached to master, so that H is our current commit—and we run git merge br1 to merge commit I. (Note that merge works on commits, really; we're just using names like br1 to find the commits.) Git needs to find our shared started point. The algorithm Git uses for this takes the commit graph as its input, and comes up with commit H ... which is the commit we're using right now.

The general merge algorithm says:

find the merge base;
run two git diff operations, from merge base to each tip commit;
combine the changes and apply to the snapshot from the merge base;
use the result as the snapshot to make a new merge commit.

One of our two git diff operations here will be:

git diff --find-renames <hash-of-H> <hash-of-H>

The output from this git diff will be empty, because the snapshot in H exactly matches the snapshot in H. There's no other possibility here!

(The other diff will compare H vs J, and thus see what work we did in br1, just like we want it to.)

The result of step 3—the "combining"—will be the set of changes from H to J, of course, because zero plus something is just the something. So applying those same changes to H gets us the snapshot that's already there in J. That means the merge result snapshot will exactly match the snapshot already in commit J.

If you force Git to make a real merge, Git will do that, and you'll get:

          I--J   <-- br1
         /    \
...--G--H------M   <-- master (HEAD)
         \
          K--L   <-- br2

where M is our new merge commit. (A merge commit is a commit with two or more parents; here, the two are H and then J, in that order, though this particular drawing doesn't show the order.)

But if you don't force it, Git will take a cheap short-cut and not make any new commit at all. Git calls this a fast-forward merge, but there's no actual merging involved. Git just does this:

          I--J   <-- br1, master (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

That is, Git makes the name master point to commit J, just as the name br1 points to J.

Having done this—i.e., not merged anything at all, just moved the branch name—we can now run git merge br2. This time, Git can't cheat, because the merge base—the common starting point—is now commit H. So this time Git has to run the two diffs:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

Git can then combine the two diffs—or at least, try to combine them; there may be a merge conflict here—and if all goes well, Git can make a new commit M, whose snapshot is the result of applying the combined diffs to the snapshot from the merge base commit H:

          I--J   <-- br1
         /    \
...--G--H      M   <-- master (HEAD)
         \    /
          K--L   <-- br2

Your experience

but that [second merge] also deleted the branchA modifications, and Git didn't show any conflict ...

For this to happen, Git must have found the merge base somewhere, done the two git diffs, found that one side didn't change 1.txt, and found that the other side did change 1.txt, in such a way that the modification you expected to retain, got removed.

How, precisely, did that happen? Well, that depends on which commit was the merge base commit in each of the two git merge operations you had Git run, and what diffs Git found when doing the merging.

To see what merge base(s) Git finds, use the git merge-base command. It takes two commits (names or hash IDs) and spits out the hash ID(s) of the merge base(s):

git merge-base --all hash1 hash2

for instance. The --all makes it list all the bases, if there are multiple merge bases; this particular case is somewhat rare, but if it happens, you have a much more complicated merge case, so it's worth checking. See also the algorithm I linked above (Lowest Common Ancestor in a DAG).

Because you've already done both merges, this is a little tricky. If both merges were true merges, git log --graph will show each of the two merge commits and, for each merge, both of its parents. That will let you run git merge-base --all more easily. But if at least one was a fast-forward merge, there's no trace of this merge left in the commit graph. In this case, you'll need to use the reflogs (for HEAD and/or for the master branch) to observe the fast-forward operation and obtain the implied previous-branch-tip / merge base from there.

You can think of this as solving a murder mystery: who killed the line of code? Was it Mr BranchA, in the library, with the rope? Or was it Ms BranchB, in the conservatory, with the candlestick? You must trace all the players' movements back from the time everyone was alive and well. Sometimes it's obvious, when Mr BranchA has the collection of antique cannons and the murder victim was shot with a cannonball from an 1800s-era howitzer. But you'll have to look at your particular situation: there's no single generic answer.

wow thanks a lot. I didn't finish reading it but I definitely will — Ben.S, Jul 14 '22 at 10:35