30

I am somewhat new to git, I've been using it for a number of months, and Im comfortable doing most of the basic tasks. So... I think its time to take on some more complicated tasks. At my work, we have a few people working on older code to update it, this involves actual code work and updating the directory structure to be more modular. My question is can these two things be done in parallel branches and then merged or rebased. My intuition says no, because dir restructure is a rename, and git renames by adding a new file and deleting the old (least this is how i understand it). But I wanted to be sure.
Here's the scenario: parent-branch looks like:

├── a.txt
├── b.txt
├── c.txt

then we branch two say, branchA and branchB. In branchB we modify the structure:

├── lib
│   ├── a.txt
│   └── b.txt
└── test
    └── c.txt

Then in branchA we update a,b, and c.

Is there someway to merge the changes done in branchA with the new structure in branchB? rebase comes to mind, however, I don't think lib/a.txt is actually connected to a.txt after a git mv...

Jameson

jmerkow
  • 1,811
  • 3
  • 20
  • 35

1 Answers1

36

First, a short note: you can always try a merge, then back it out, to see what it does:

$ git checkout master
Switched to branch 'master'
$ git status

(make sure it's clean—backing out of a failed merge when there's changes is not fun)

$ git merge feature

If the merge fails:

$ git merge --abort

If the automatic merge succeeds, but you don't want to keep it just yet:

$ git reset --hard HEAD^

(Remember that HEAD^ is the first parent of the current commit, and the first parent of a merge is "what was there before the merge". Thus, if the merge worked, HEAD^ is the commit just before the merge.)


Here's a simple recipe for finding out what renames git merge will automatically detect.

  1. Make sure diff.renamelimit1 is 0 and diff.renames is true:

    $ git config --get diff.renamelimit
    0
    $ git config --get diff.renames
    true
    

    If these are not already set this way, set them. (This affects the diff step below.)

  2. Choose which branch you're merging-into, and which you're merging-from. That is, you are going to do something like git checkout master; git merge feature soon; we need to know the two names here. Find the merge base between them:

    $ into=master from=feature
    $ base=$(git merge-base $into $from); echo $base
    

    You should see some 40-character SHA-1, like ae47361... or whatever here. (Feel free to type out master and feature instead of $into and $from everywhere here. I am using the variables so that this is a "recipe" instead of an "example".)

  3. Compare the merge base against both $into and $from to see which files are detected as "renames":

    $ git diff --name-status $base $into
    R100    fileB   fileB.renamed
    $ git diff --name-status $base $from
    R100    fileC   fileD
    

(You might want to run these diffs with the output saved to two files, and then peruse the files later. Side note: you can get the effect of the third diff with special syntax, master...feature: the three dots here mean "find the merge base".)

The two output sections have a list of files Added, Deleted, Modified, Renamed, and so on (this example has just the two renames, with 100% matches).

Since $into is master, the first list is what git thinks has already happened in master. (These are the changes git "wants to keep", when you merge-in feature.)

Meanwhile, $from is feature, so the second list is what git thinks happened in feature. (These are the changes git wants to "now add to master", when you do the merge.)

At this point, you have to do a bunch of work:

  • Files marked R, git will detect as renamed.
  • If the two R lists are the same in both branches, you may be all good (but read on anyway). If there are Rs in the first list that are not in the second ... well, see below.
  • When you run git checkout master; git merge feature (or git checkout $into; git merge $from) git will do the renames shown in the second list, in order to "add those changes" to master.
  • In any case, compare this with the files you want git to detect as renamed. Look for D and A entries that you wanted to have show up as R entries: these occur when, in one of the branches, you not only renamed the file, but also changed the contents so much that git no longer detects the rename.

If the second list does not show everything you want to see, you're going to have to help git out. See even longer description below.

If the first list has a rename that's not in the second, this may be entirely harmless, or it may cause an "unnecessary" merge conflict and a missed chance for a real merge. Git is going to assume that you intend to keep this rename, and also look at what happened in the merge-from branch ($from, or feature in this case). If the original file was modified there, git will attempt to bring the changes from there into the renamed file. That is probably what you want. If the original file was not modified there, git has nothing to bring in and will leave the file alone. That's also probably what you want. The "bad" case is, again, an undetected rename: git thinks the original file was deleted in branch feature, and a new file with some other name was created.

In this "bad" case, git will give you a merge conflict. For instance, it might say:

CONFLICT (rename/delete): newname deleted in feature and renamed in HEAD.
Version HEAD of newname left in tree.
Automatic merge failed; fix conflicts and then commit the result.

The problem here is not that git has retained the file under its new name in master (we probalby want that); it's that git may have missed the chance to merge the changes made in branch feature.

Worse—and this might be classifiable as a bug—if the new name occurs in the merge-from branch feature, but git thinks it's a new file there, git leaves us with only the merge-into version of the file in the work tree. The message emitted is the same. Here, I made a few more changes in master to rename fileB to fileE, and on feature, made sure that git would not detect the change as a rename:

$ git diff --name-status $base master
R100    fileB   fileE
$ git diff --name-status $base feature
D       fileB
R100    fileC   fileD
A       fileE
$ git checkout master; git merge feature
CONFLICT (rename/delete): fileE deleted in feature and renamed in HEAD.
Version HEAD of fileE left in tree.
Automatic merge failed; fix conflicts and then commit the result.

Note the potentially misleading message, fileE deleted in feature. Git is printing the new name (the master version of the name); that's the name it believes you "want" to see. But it is file fileB that was "deleted" in feature, replaced by an entirely new fileE.

(git-imerge, mentioned below, may be able to handle this particular case.)


1There's also a merge.renameLimit (spelled with lowercase limit in the source, but these configuration variables are case-insensitive) that you can set separately. Setting these to 0 tells git to use "a suitable default", which has changed over the years as CPUs have gotten faster. If a separate merge rename limit is not set, git uses the diff rename limit, and again a suitable default if that's not set or is 0. If you set them differently, merge and diff will detect renames in different cases, though.

You can also now set the "rename threshold" in a recursive merge with -Xrename-threshold=, e.g., -Xrename-threshold=50%. The usage here is the same as for git diff's -M option. This option first appeared in git 1.7.4.


Let's say you are on branch master, and you do git merge 12345467 or git merge otherbranch. Here's what git does:

  1. Find the merge-base: git merge-base master 1234567 or git merge-base master otherbranch.

    This yields a commit-ID. Let's call that ID B, for "Base". Git now has three specific commit IDs: B, the merge base; the commit ID of the tip of the current branch master; and the commit ID you gave it, 1234567 or the tip of branch otherbranch. Let's just draw these in terms of the commit graph, for completeness; let's say it looks like this:

    A - B - C - D - E       <-- master
          \
            F - G - H - I   <-- otherbranch
    

    If all goes well, git will produce a merge commit that has E and I as its two parents, but we want to concentrate here on the resulting work tree rather than the commit graph.

  2. Given these three commits (B E and I), git computes two diffs, a la git diff:

    git diff B E
    git diff B I
    

    The first is the set of changes made on branch, and the second is the set of changes made on otherbranch, in this case.

    If you run git diff manually, you can set the "similarity threshold" for rename detection with -M (see above for setting it during merge). Git's default merge sets automatic rename detection to 50%, which is what you get with no -M option and diff.renames set to true.

If the files are "sufficiently similar" (and "exactly the same" is always sufficient), git will detect renames:

    $ git diff B otherbranch  # I tagged the merge-base `B`
    diff --git a/fileB b/fileB.txt
    similarity index 71%
    rename from fileB
    rename to fileB.txt
    index cfe0655..478b6c5 100644
    --- a/fileB
    +++ b/fileB.txt
    @@ -1,3 +1,4 @@
     file B contains
     several lines of
     stuff.
    +changeandrename

(In this case I just renamed from fileB to fileB.txt but the detection works across directories too.) Let's note that this is conveniently represented by git diff --name-status output:

    $ git diff --name-status B otherbranch
    R071    fileB   fileB.txt

(I should also note here that I have diff.renames set to true and diff.renamelimit = 0 in my global git config.)

  1. Git now attempts to combine the changes from B to I (on otherbranch) into the changes from B to E (on branch).

If git is able to detect that lib/a.txt is renamed from a.txt, it will connect them. (And you can preview whether it will by doing a git diff.) In this case the automatic merge result is likely to be what you want, or sufficiently close.

If not, though, it won't.

When the automatic rename detection fails, there's a way to break up commits (or maybe they are already sufficiently broken-up) step-wise. For instance, suppose in the sequence of F G H I commits, one step (maybe G) simply renames a.txt to lib/a.txt, and other steps (F, H, and/or I) make so many other changes to a.txt (under whatever name) to fool git into not realizing that the file was renamed. What you can do here is increase the number of merges, so that git can "see" the rename. Let's assume for simplicity that F does not change a.txt and G renames it, so that the diff from B to G shows the rename. What we can do is first merge commit G:

git checkout master; git merge otherbranch~2

Once this merge is complete and git has renamed from a.txt to lib/a.txt in the tree for the new merge commit on branch branch, we do a second merge to bring in commits H and I:

git merge otherbranch

This two-step merge causes git to "do the right thing".

In the most extreme case, an incremental, commit-by-commit merge sequence (which would be extremely painful to do manually) will pick up everything that could be picked up. Fortunately someone has already written this "incremental merge" program for you: git-imerge. I have not tried this but it's the Obvious Answer for hard cases.

torek
  • 448,244
  • 59
  • 642
  • 775
  • Ok, this is somewhat helpful. It looks like I need to set `git config --global diff.renames true` and `git config --global diff.renamelimit 0`, then git will recognize renames as the same files instead of add and deletes. I tried this on my test repository, however it is not recognizing the renames, (I get all adds/deletes for renamed files using `git diff`). Do those settings need to be set prior to the renames? – jmerkow Mar 25 '14 at 18:54
  • No, git does rename detection "after the fact": it notices, at diff-time, that file `A.ext` is gone, file `B/C.foo` exists now, and that the diff between `A.ext` and `B/C.foo` is "small", so it decides there was a rename. If git is not picking up on this, you can try setting the `-M` rename-detection threshold (the default is `-M50%`, higher = more strict, lower = less strict). There are additional tweakable values as well—but `git merge` won't let you set any of them, including `-M`, which is a bit of a problem. – torek Mar 25 '14 at 20:55
  • so use `git diff --name-status -M10% branchA branchB`? or does it have to be done with a merge command? – jmerkow Mar 26 '14 at 18:54
  • Well, if `-M10%` finds it and `-M50%` doesn't, merge won't find it. (Also you don't want to compare branches A and B so much as the merge-base of the two, with each tip.) In this case you're going to have to help git out somehow, whether that's with a tool like imerge, or by doing all the renaming "by hand" (perhaps based on earlier `git diff --name-status -M10%` output) as a pre-merge commit so that git doesn't need to find the merges. – torek Mar 26 '14 at 20:38
  • Great work in explaining this, and thanks for the hint about imerge. Now, I see merge.renameLimit config too, why would I use the diff.renames option? (or this wasnt' there previously?) – inger Sep 22 '14 at 21:46
  • @inger: merge defaults to the diff limit. Looks like the separate merge limit was added in git 1.5.6; I just never noticed. There's also `-Xrename-threshold=` now, which apparently was new in 1.7.4. I know I checked all this once in the source, but I don't remember when, must have been a long time ago now. :-) – torek Sep 22 '14 at 21:54
  • Holy hell, you just saved my whole quarter. A colleague was working on the same project as I, and I didn't realize they completely changed the directory structure of the project until I was ready to commit my changes. Thank you so very much for this. – Jonathan E. Landrum Oct 13 '16 at 15:48
  • Note: with Git 2.18, the merge conflict might be a "modify/delete" one, not a rename/delete one. See https://stackoverflow.com/a/3100888/6309. – VonC Jun 03 '18 at 21:36