Moving file from one repo to another and changing it in parallel

Question

I have two repositories: A and B. File doc.txt is located in A repository in A_master branch. Current B branch is B_master.

I created a branch based on A_master in A repository. I call it A_feature. I have also created a branch based on B_master in B repostory, B_feature. I commited doc.txt deletion in A_feature. I then commited same doc.txt addition in B_feature. Thus, moved doc.txt from A repository to B repository in *_feature branches. Afterwards, someone changes doc.txt in A_master. I am going to merge A_feature and B_feature.

Will I lose doc.txt changes made in A_master? Why?
Will I see conflicts in *_feature pull requests? Why?

Edit 1

... I am going to merge A_feature in A_master and B_feature in B_master.

torek · Answer 1 · 2021-12-07T21:00:25.823

A Git repository is, at its heart, a collection of commits. It's not about files (though commits hold files). It's not about branches (though branch names help us, and Git, find commits). It's really just a collection of commits.

A commit, in Git:

Is read-only. No commit can ever be changed! Not one single bit can be changed.
Is numbered, with a big ugly hash ID expressed in hexadecimal. The number is unique to that one commit: no other commit, anywhere, in any Git repository, can have that number. Any other Git repository that does use that number, uses it to hold a copy of that commit.
Contains a snapshot of all of your files (in a special, compressed, Git-only, de-duplicated format) and some metadata.

A branch name in Git simply holds the hash ID—the commit number—of one particular commit: the latest one for that branch. That's all it is, really: a name for one commit. As you make new commits in that branch, Git automatically replaces the stored hash ID with the latest one.

(The metadata within each commit joins them together so that, starting from the latest, Git can work backwards to every previous commit. So holding the hash ID of the latest commit is sufficient.)

The act of check out a commit causes Git to extract all the saved files that are in that commit. The saved files in the commit cannot be changed, and only Git itself is able to read them so we have to extract them to use them. Once extracted from Git, those files are not in Git any more. That is, the files you see and work with may have come out of a Git repository and a commit, but now they're just ordinary files.

Now that you know the above, you can see where there is an error in your description:

[I] moved doc.txt from A repository to B repository ...

It's literally impossible to move a file from one repository to another:

Repositories don't hold files; they hold commits.
Commits can't be changed. "Moving" a file implies that it's gone from one location, and now appears in another. So this would require changing the files inside some commits, and that's not possible.

You can copy a file that you've extracted from some A-repository commit into the working tree for your B* repository, use git add to prepare it to go into a new commit in B, and run git commit in B to add a new commit to B in which the file exists.

You can remove the file from your working tree in A and add the removal (git add the removed file, or use git rm to do the whole thing in one shot) and then make a new commit in A, to add a new commit to A in which the file doesn't exist. The file continues to exist in previous commits in A.

Afterwards, [I made and committed] someone changes [to] doc.txt in A_master.

This implies you copied doc.txt into the working tree for B, rather than "moving" (copying-and-then-removing) doc.txt. The new additional commits you made in repository A hold the updated versions of doc.txt. The previously-existing commits continue to hold the older versions.

I am going to merge A_feature and B_feature ...

This may be difficult: git merge operates on the commits in one repository. You have two different repositories, A and B. If they contain the same starting commits—remember, Git is all about the commits, as found by their commit numbers—you may be able to load the currently-private-to-A commits into repository B, or the B commits into A, and then you may be able to run git merge on these commits.

Note that while git merge takes a branch name:

git checkout br1       # or git switch br1
git merge br2

these operations are fundamentally about the commits in the repository. The merge operation, git merge br2, uses the name br2 to find the most recent commit for that branch. It then uses the commit metadata from the current commit, and the designated commit, and any predecessor commits as needed, to locate the common starting point commit—the merge base—from which the two branch-tips are descended.

If the commits aren't in the same repository, it's impossible to merge them in the first place.

Edit per "edit 1"

... I am going to merge A_feature in A_master and B_feature in B_master.

Let me expand on my own parenthetical comment now:

(The metadata within each commit joins them together so that, starting from the latest, Git can work backwards to every previous commit. So holding the hash ID of the latest commit is sufficient.)

More specifically, the metadata in a commit includes the raw hash ID of its immediate predecessor commit. We therefore say that a commit points to its parent, which we can draw this way:

... <-F <-G <-H   <--somebranch

The branch name somebranch here serves to hold the hash ID H of the last commit in this chain. Commit H then holds both a snapshot (all files, with each file compressed and de-duplicated against any other copy of the file in this or any other commit) and metadata; the metadata in H holds the hash ID of earlier commit G. Commit G, being a commit, holds a snapshot and metadata, and its metadata holds the hash ID of earlier commit F, and so on.

When you git checkout or git switch to some branch by name, you check out the commit to which the branch name points. For instance, if you have:

...--F--G--H   <-- master

and you run:

git switch master

Git will extract, from commit H, the snapshot of all files.

When you update some files and git add and/or use git rm, then run git commit, Git will add a new commit using the updated-and-added and/or removed files. This new commit has a full snapshot (based on what you git add-ed, plus any files you didn't change or remove). It points backwards to what was the current commit:

...--F--G--H   <-- does anything point here now? (commit I does)
            \
             I   <-- how about here?

The tricky bit is that whatever branch name is your current branch, as per git checkout or git switch, Git now writes I's hash ID into that branch name:

...--F--G--H--I   <-- master

Side note: Git makes this new commit's snapshot from whatever is in Git's index or staging area at this point. (Index and staging-area are two terms for a single Git thing.) Using git add modifies the index / staging-area, so as to prepare for the next commit. The files you see and work with in your working tree are there for you, not for Git itself: Git works instead with the files stored in its index. The git add command is a way of saying to Git: Make the index copy of some file(s) match the working tree copy of those files.

Why this matters

In repo A you now have two branch names:

...--F--G--H   <-- master, A_feature

You pick one of them to be the current branch with git checkout A_feature. To remember which one is the current branch, we add the special name HEAD to our drawing:

...--F--G--H   <-- master, A_feature (HEAD)

Now you make change(s) to some file(s), git add if needed (git rm makes the change to both your working tree and Git's index, so that no separate git add is needed), and commit:

...--F--G--H   <-- master
            \
             I   <-- A_feature (HEAD)

The change you made was to remove doc.txt, so the snapshot in new commit I has one fewer files in it than the snapshot in commit H.

As you make more changes and commit them, you get more commits:

...--F--G--H   <-- master
            \
             I--J   <-- A_feature (HEAD)

You mention that someone else who has write access to this repository (whoever that might be, and however that might occur) now does a git checkout master:

...--F--G--H   <-- master (HEAD)
            \
             I--J   <-- A_feature

They now modify doc.txt, use git add, and run git commit:

             K   <-- master (HEAD)
            /
...--F--G--H
            \
             I--J   <-- A_feature

Commit K has the same files as commit H except that its copy of doc.txt is different.

If they make another commit, we get:

             K--L   <-- master (HEAD)
            /
...--F--G--H
            \
             I--J   <-- A_feature

I am going to merge A_feature in A_master and B_feature in B_master.

So you will now take this repository, with HEAD attached to master like this, and run:

git merge A_feature

The merge operation, in Git, finds two commits to start with:

your current commit L (via HEAD and then master);
the other commit J (via the argument A_feature).

It then uses the graph that we've been drawing to find the best shared commit that's on both branches. In this drawing, that's commit H.

Now merge does its real work:

The merge has to compare the snapshot in H to that in K to see what you changed on the current branch. According to your description, what changed is, or includes, data within the file named doc.txt.
The merge has to compare the snapshot in H to that in L to see what they (whoever they are—it's actually you) changed on the other branch. Per your description, the change is, or includes, the deletion of the file named doc.txt.
The merge operation must now combine the changes.

The usual rules for combining changes within one file are simple and are based purely on text lines. But in this case, you did not change any lines in the H-to-J diffs. Instead, you deleted the entire file. This is a "high level" or "tree level" operation. Meanwhile, they did change some lines in the same file you deleted.

Git is unable to combine these two changes. It has no rule for resolving this (not even with -X ours or -X theirs). You will get a merge conflict. When this happens, Git leaves its index / staging-area in an expanded "conflicted" state. Git stops the merge in the middle and exits the git merge command with a failure status, signifying that something went wrong.

Your job is now to fix what went wrong, updating Git's index / staging-area. You may use the files left in the working tree for this purpose, if you like: Git tries to leave something useful here for you to work with. But as always for any commit, what really matters to Git are the copies of files that are in its index.

(Side note: to see more directly what's in Git's index, use git ls-files --stage. This produces a huge amount of output in a big repository. The git status command is a more useful way to see what's in Git's index, in a more compact form: Git tells you what's there by comparing what's there to what's in the HEAD commit, and then also by comparing what's there to what's in your working tree. Only things that are different get mentioned here. That way, if you have nine thousand files, but only changed three of them, you only have to look at three file names, not all 9000.)

As always, once you have the correct file(s) ready, you must use git add to have Git update its index. Adding a conflicted file back "collapses down" the expanded index entries for that file, resolving the merge conflict for that file. Or, if the correct way to resolve the problem is to remove the file, you can use git rm to do this. When you have resolved all the conflicts, you may run git merge again to finish the merge:

git merge --continue

Or, for historical reasons, you can run git commit to finish the merge:

git commit

Git will notice that you've finished resolving the conflicts but are still in the middle of the merge, and will finish the merge either way. (Using git merge --continue currently literally runs git commit, but first makes sure there is a merge that is ready to finish. It's therefore somewhat better to use git merge --continue, but the old way will be supported or a long time to come, probably forever.)

The final result of the merge

Had you not gotten a conflict, Git would have made a new merge commit on its own. Since you did get a conflict, you had to resolve it, then finish the merge yourself. In either case, Git is now ready to make a new commit, in the usual way—mostly. The new commit has the current commit as one of its two parents, but instead of just the one parent, it also has the other commit as its second parent.

The new merge commit M still has a single snapshot, just like any commit. This contains (compressed and de-duplicated as usual) a full copy of every file, exactly as you arranged these files in Git's index / staging-area. But the two parents of M are J and L. Having written out commit M, Git then stores the new commit's hash ID into the branch name as usual, so our picture now looks like this:

             K--L
            /    \
...--F--G--H      M   <-- master (HEAD)
            \    /
             I--J   <-- A_feature

The merge operation is now complete. The snapshot for M is whatever you put in it (because git merge stopped with a conflict, which gave you an opportunity to put any files you like into it).¹ The first parent of M is L, which is the commit that was the HEAD commit when you started; now the HEAD commit is commit M, of course. The second parent of M is J, the commit you named on your git merge command.

¹Note that if you're going to make an unconflicted merge, git merge other will make that commit on its own, and the files in that snapshot are the result of the automatic merge that Git made between the two branch tip commits based on the merge base. However, you can run git merge --no-commit: this places the merge result into the index / staging-area as usual, but then stops even though there was no merge conflict. You may now complete the merge later with git commit or git merge --continue, as if there had been a conflict—but you can also modify what's in the staging area as usual.

This gives you the opportunity to create an evil merge. See the link for what this is and why you should not abuse this ability.

What happens in repo B

I leave this as an exercise. Draw the graph with the various names pointing to the various commits. Add new commits, making note of what's different in the various snapshots you make. Then think about how git merge will run the two git diff commands: find the merge base commit, and see what changed since that snapshot, in each of the two branch-tip commits. Consider how Git will try to combine those changes. Are there conflicts? Are they whole-file / tree-level / high-level conflicts?

Thanks for your feedback and great answer. I have corrected my question so that it would fit my requirements: ... I am going to merge A_feature in A_master and B_feature in B_master. — A. Medvedev, Dec 07 '21 at 15:46

score 0 · Answer 2 · answered Dec 06 '21 at 14:42

0

If I understand your problem correctly, branches A_feature and B_feature are in different repositories so they can't be merged with each other. So this scenario can't happen.

answered Dec 06 '21 at 14:42

Joshua Zeltser

488
2
9

Thanks for your comment! I fixed my question. I am going to merge A_feature in A_master and B_feature in B_master. – A. Medvedev Dec 07 '21 at 14:52

Moving file from one repo to another and changing it in parallel

2 Answers2

Edit per "edit 1"

Why this matters

The final result of the merge

What happens in repo B