0

How can we merge two branches A and B when:
In branch A we have a file named file1.py:

print('file1')
print('branch A')

In branch B we have a file named file1.py:

print('file1')
print('branch B')

And a file named file2.py:

print('file2')

the result I would like to have is a branch B like this:
file1.py

print('file1.py')
print('branch A')

file2.py:

print('file2')

Its like a merge A -> B wihtout touching file existing in B that not exist in A.

  • `git checkout B` then `git merge A` should do. If there is any merge conflict just solve it by keeping the `A branch` version. – Deividgp Feb 04 '22 at 10:45
  • I just try and the result is just the file1.py, but the file2.py is deleting without merge conflict. – Yoan Bousquet Feb 04 '22 at 10:52
  • So file2.py gets deleted when you merge A into B? I recommend you to check this link https://community.atlassian.com/t5/Sourcetree-questions/Git-merge-deleting-files-while-merging/qaq-p/1115985 . Might be because you had file2.py in A. – Deividgp Feb 04 '22 at 10:58

2 Answers2

1

The key item you're missing here is that git merge does not just combine branch-A and branch-B. In fact, before we can define combine, or even branch, we have to step back and consider what Git really does.

You might think that Git stores files in branches, but that's not right, and it still leaves you with the problem of defining the word branch. What, exactly, is a "branch"? Does Git store branches? It's clear from your question wording that you don't really know what a branch is—but that's OK, a lot of Git users don't! They have all kinds of trouble with Git too. We need a solid foundation before we can build a house or skyscraper, so let's start with commits.

Git stores commits

Git's reason for existence, and its basic unit of storage when you're working with Git, is the commit. So you need to know what a commit is and does for you. Each commit:

  • Is numbered: a commit has a unique global (or universal) ID—a GUID or UUID—which Git calls a hash ID or object ID (abbreviated OID ). When you make a new commit, it gets a new, unique ID, unique across every Git repository. That hash ID can never be used for any other commit. That's why commit hash IDs are so big and ugly. In fact, these numbers are impossible for humans to work with (other than by cut-and-paste or similar), so we don't normally use them directly.

  • Is read-only: once you make a commit, it is frozen for all time. (The numbering trick—the hashing scheme—that Git uses depends on this.)

  • Contains two things: a full snapshot of every file, and some metadata, or information about the commit itself, such as your name and email address and a date-and-time-stamp.

The snapshots in the commit are themselves read-only, and the files are compressed, Git-ified, and—important for many reasons, including to keep the repository size under control—de-duplicated. So when you make a new snapshot—a new commit—that mostly re-uses the files from an earlier commit, it hardly takes any space for those new files, as most of them are the old files. But this does mean that these files are entirely useless for getting new work done, because to do new work, you need:

  • files that every program, not just Git, can read; and
  • files that programs can modify or replace.

So you won't actually do any work with the files that are stored inside the commits. We'll come back to this in a moment.

Meanwhile, the metadata in each commit include more than just your name and email address. Git adds, for its own purposes, a list of previous commit hash IDs into each commit's metadata. This list usually has just one entry. That one entry is the commit's parent.

The parent of a commit is how Git stores history. The history in a repository is simply the set of commits in the repository. We start at the end, with the latest commit, which has some big ugly hash ID: let's call that commit H (for Hash), although real hash IDs are things like 5d01301f2b865aa8dba1654d3f447ce9d21db0b5 (you can see why humans don't use these things!). Let's draw commit H as a letter with an arrow sticking out of it, pointing backwards:

            <-H

That's a representation of the last commit. Commit H stores, in its metadata, the raw hash ID of its parent: the commit that comes just before H. Let's call that commit G and draw it in:

        <-G <-H

Commit G, of course, has a snapshot and metadata too, and its metadata contains the raw hash ID of still-earlier commit F, which has a snapshot and metadata, and so on:

... <-F <-G <-H

This is our chain of commits. It ends at our latest commit H. We must give Git the hash ID H so that Git can find commit H, but once we do that, Git will use the snapshot in H for the snapshot and the metadata in H for the parent, and can find commit G on its own. Git can then use the snapshot and metadata and work its way back to F on its own, and from there, work back to E, D, and so on—all the way back to the very first commit (presumably commit A).

What will make Git stop going backwards is that A will have an empty list of parent commit hash IDs, so that A simply doesn't point backwards at all. So the first commit ever in any given repository is slightly special: it doesn't point backwards to an earlier commit, meaning there is no earlier commit.

(Note that when we clone a repository, we copy all the commits. Since they're exact copies, they have the same hash ID, and our Git software can easily tell, by comparing hash IDs, that our commit H is the same commit we got when we cloned their commit H: the hash IDs will match. This is how two Git repositories share commits later: by comparing hash IDs. They need not look at anything else, because the hash IDs are GUIDs.)

Branch names find commits

Now, as humans, we're bad at hash IDs. We could write them down, or save them in files, or something, but we have a computer: why not have the computer save the latest hash ID H somewhere? And that's the first thing a branch name is and does for us:

...--F--G--H   <-- main

Here, the branch name main holds just the one hash ID H. That's our latest commit, so it's the one we want to find quickly, and we can: we just tell Git "get me main" and that means "get me commit H".

We can create new branch names any time we like. Note that our branch names, in our repository, are our names—they're not anyone else's branch names!—and we can create and destroy branch names whenever we feel like doing that, or change them to any spelling we like, as long as they meet Git's "branch name" qualities. But I like to use the same branch names—or at least, mostly the same—in my repository as there are in the repository I cloned to make my repository. So we may try not to just create random profusions of names. Git won't care—Git only cares about the commits—but we might confuse ourselves if we go overboard.

Anyway, given:

...--G--H   <-- main

let's create two new branch names, br-A and br-B. In Git, a branch name must point to some particular commit. The usual choice is "the current commit", which—since we're initially using our branch name main—is commit H, the latest main commit. So now we have:

...--G--H   <-- br-A, br-B, main

We now need a way, in our drawing, to show which name we're using. Let's attach the special name HEAD, written in all uppercase, to just one of these three branch names, like this:

...--G--H   <-- br-A, br-B, main (HEAD)

This indicates that we are "on" branch main—running git status, we'll see on branch main—and since main points to H, we're using commit H.

A short sidebar on "using a commit"

To use some particular branch name, we run either git checkout (in old Git versions) or git switch (since Git 2.23). Git now looks up the branch name, finds the commit it specifies, and tries to switch to that commit and branch. If that's possible, we are now "on the branch" and we see the files from that commit.

To make that happen, Git literally copies the files out of the commit. Commit H contains a full snapshot of every file—just like any commit—but those files are in that special, Git-ified, de-duplicated form. So Git expands the files into ordinary everyday files. These ordinary copies go into ordinary folders—Git's files aren't really in folders (this part gets complicated but we can mostly just ignore it)—and now we have ordinary files to do ordinary work with.

The files you work with, in your working tree, are not in Git. They were copied out of Git. When you make new commits, you'll store new (de-duplicated) Git-ified copies into Git. These won't be the files in your working tree: they'll be Git-ified, de-duplicated copies of the files in your working tree. So now, at this point, you're free to do anything at all with these files. They're not Git's, they're yours.

All this extra copying is why you have to run git add so often. We'll skip over the details, although they are important: it's the presence of a Git-ified, de-duplicated copy in Git's index or staging area (two words for the same thing) that makes a file tracked and thus go into the next commit.

Anyway, if you check out some other commit, Git will have to remove, from your working tree, the files from this commit, and put in instead the files from the other commit. We'll see this in action in a moment. For now, let's just run git checkout br-A to get "on" branch br-A:

    ...--G--H   <-- br-A (HEAD), br-B, main

We're still using commit H, but now we're using it because of the name br-A. Since we didn't actually switch commits, Git does not do the remove-and-replace-files thing, this time.

Making a new commit

To make a new commit, we:

  • fuss with the files in our working tree;
  • run git add and/or git rm if/as we like; and
  • run git commit.

The git add and git rm commands manipulate Git's index or staging area, which we mentioned in passing above. This holds the Git-ified copies of files that will go into the next commit, and git add means make the staging copy match the working tree copy. When we run git commit, Git:

  • Gathers all the metadata it needs for the new commit: your name and email settings from user.name and user.email, the current date-and-time, and so on. This includes the current commit hash ID, i.e., the actual hash ID of commit H.

  • Freezes into a permanent snapshot the (pre-de-duplicated, Git-ified copies of) files that are in Git's index/staging-area. This will be the snapshot for the new commit.

  • Writes out the combined metadata and snapshot, forming a new commit. The new commit gets a new, random-looking (but not really random) hash ID. We cannot predict what it will be unless we know exactly what files you'll be snapshotting and exactly what time you'll run git commit. But we'll just call the new commit "commit I" for short.

When we look at what happens with the commits, we get this:

              I   <--- br-A (HEAD)
             /
    ...--G--H   <-- br-B, main

Here's the sneaky trick Git pulled: As soon as Git got the new commit's hash ID, Git wrote that hash ID into the name br-A. So now br-A, the name that is, selects new commit I. Commit H is not changed at all, but new commit I points back to existing commit H. The other branch names don't move either. The special name HEAD remains attached to the name br-A, so we're still on branch br-A. But the last commit on br-A is now commit I.

If we repeat this for another new commit, we'll get a new commit J:

              I--J   <--- br-A (HEAD)
             /
    ...--G--H   <-- br-B, main

Note that the files in the snapshot in J probably don't match the files in the snapshot in H. That's how we want things to work.

We're now ready to start working on branch br-B though, so now we switch to it:

git switch br-B

which results in this:

              I--J   <--- br-A
             /
    ...--G--H   <-- br-B (HEAD), main

The name br-A continues to point to commit J, but now we're using the name br-B, which points to commit H. We've changed commits, so Git pulls out all the commit-J files and puts in, instead, all the commit-H files.

We now do our work as usual and run git add and/or git rm as usual. Then we run git commit and Git makes a new snapshot and metadata. We'll call this new commit K—the next letter—although in reality it has some big ugly random-looking hash ID, and we'll draw it in:

              I--J   <--- br-A
             /
    ...--G--H   <-- main
             \
              K   <-- br-B (HEAD)

When we make yet another commit L, we get:

              I--J   <--- br-A
             /
    ...--G--H   <-- main
             \
              K--L   <-- br-B (HEAD)

Note that commits up through H are on all three branches right now. Commits I-J are only on br-A right now, and commits K-L are only on br-B right now. I say right now because these branch names move. The set of commits that is "on" some branch is the series of commits that ends at the commit to which the name points, so here, main ends at commit H right now, and therefore that's the last commit on main.

Git calls that last commit a tip commit, as in the "tip of the branch". That's a useful term because it's quite specific. We know exactly what a commit is now, and if we pick one out, we know what it has in it.

You now have a pretty good idea of what "branch" means

As we can now see, the word branch is not very well defined. Sometimes it means a branch name like main or br-A or br-B. Sometimes it means the final commit on the branch. And sometimes it means some or all commits findable by working backwards from the tip commit. People will say branch and mean any of these things, or perhaps even more things (Git has remote-tracking names, which some call remote-tracking branch names and therefore include in the word "branch"). Try to avoid the word branch because it's so fuzzy, but remember that people will say branch and you're supposed to figure out what they mean—if they even know what they mean!

At last, we can look at merges

Let's take this setup again, though this time I'll leave out the name main because it's in the way:

          I--J   <-- br-A (HEAD)
         /
...--G--H
         \
          K--L   <-- br-B

If you run git merge br-B, Git must now locate three commits. These three commits are:

  • the current or HEAD commit, which is really easy to find: that's where we are now;
  • the commit you named on the command line: br-B is a name for commit L, so commit L is the "other" commit for the merge; and
  • the merge base commit.

This last commit, the merge base, is the first complicated part of a merge. Git finds the merge base on its own for you, so that you don't have to—this was a really important feature back in the early days of Git, where some version control systems didn't do that—and we won't go into details about how Git finds this merge base. We'll just describe the merge base a little bit vaguely this way: The merge base of two commits is the best shared commit that's on both branches.

Here, we're using the word branch to mean "commits as found by starting at the two tip commits and working backwards". We must find some commit(s) that are on both branches. Commit L doesn't count because it's only on br-B, and commit J doesn't count because it's only on br-A. Commits I and K don't count for the same reasons. But commit H is on both branches. In fact, it's easy to see from this drawing that commits H and all earlier commits are on both branches.

The best such commit is, loosely speaking, the "latest" one "closest to the two tip commits". That's commit H. So commit H is the merge base.

A merge is about combining work

Git now performs the merge by combining work.

We know that each commit holds a full snapshot of every file. So, in commit H, we have a full snapshot of every file as of the form it had at the time you (or whoever) made commit H. Commit I also has a snapshot of every file, and commit J has a snapshot of every file.

Git can figure out what you changed on br-A by comparing the files in H with the files in J.1 Whatever is different, those are your changes on branch br-A.

Meanwhile, Git can do the same thing with H-vs-L. Whatever is changed from H to L, that's what they (whoever they are) did on br-B.

What git merge will do, having done these two comparisons, is combine the changes. In your case, the changes on one side may touch one file file1.py. The changes on the other side may also touch that same file. But on one of the two "sides" of the merge, one side removed file2.py. If that was "them", on br-B, and if you didn't touch file2.py at all, Git will combine your changes to file1.py with their removal of file2.py and *retain the removal of file2.py.

When you run:

git merge br-B

Git will find all the changes and combine them, and then attempt to apply the combined changes to the snapshot in commit H. That keeps your changes and adds theirs, or, equivalently, keeps their changes and adds yours. Where your changes and theirs do not conflict, Git will assume that this is the right result.

If this is not the right result, you must correct it. You have a lot of options for doing this, including running git merge --no-commit. This forces git merge to stop with the merge incomplete, even if there are no conflicts at all.

If Git is able to finish the merge on its own, and you don't stop it, Git will go on to make a new merge commit, like this:

          I--J
         /    \
...--G--H      M   <-- br-A (HEAD)
         \    /
          K--L   <-- br-B

New commit M, the merge, extends the current branch br-A in the usual way. New commit M has a snapshot as usual—that's the snapshot Git built by applying the combined changes to the snapshot in commit H—and metadata as usual as well. The only thing that is special about merge commit M is that instead of listing one parent, commit J, it lists two: commit J first, and then commit L second.

If you do choose to use git merge --no-commit, Git will always stop before making M. You may now make any changes you like and—if needed—run git add. For instance, if Git chose to delete file2.py, you can use git restore to extract it from commit J. Be aware that if you do do this, you need to make a remark about it in your commit message, so that someone else, later, doesn't wonder why Git kept the file, against Git's normal rules. In any case, when you're done re-arranging the snapshot to go into the final merge, use git merge --continue or git commit to make new commit M. Git will make it as a merge commit, with the two parents instead of one.

Alternatively, you can let Git go ahead and make merge M on its own, and then restore the missing file from commit J and make a new commit N:

          I--J
         /    \
...--G--H      M--N   <-- br-A (HEAD)
         \    /
          K--L   <-- br-B

The file file2.py will be present in J, absent in M, and present again in N.

Last, if you prefer that file2.py not go missing at all, you can check out branch br-B and, before you start the merge, add a correcting commit that puts file2.py back:

          I--J   <-- br-A (HEAD)
         /
...--G--H
         \
          K--L--L2   <-- br-B

Now when you switch back to br-A and merge, Git will compare H vs J to see what you changed, and H vs L2 to see what they changed. Since you put file2.py back, there's no change to file2.py in the H-vs-L2 comparison. The combined changes will therefore not remove file2.py.


1Git could compare H vs I, and then I vs J, but in fact it just jumps straight to the end and compares H directly to J. A commit-by-commit comparison might help in certain cases, e.g., to handle file renames better. Git should probably have this as an option, but it doesn't.

torek
  • 448,244
  • 59
  • 642
  • 775
0

Could you please check this post : https://stackoverflow.com/a/7292109/18081892

It is also depending on which sequence you have done on your repo.If the file2.py has been removed from the Branch A and then you merge the branch A, I think the behaviour is the normal one.

niavlys
  • 38
  • 1
  • 6