1

Changes that are obviously separate to a human (who can understand code) are regularly messed up by Git's diff algorithm. For example

def method_that_already_existed(blah)
  a line that did not change
  a line that was deleted    ######## the changed area starts here (per Git)
  a new line
end

def a newly_added_method_that_belongs_in_its_own_commit
  blah blah blah 
  blah blah blah
  etc.                       ######## the changed area ends here (per Git)
end

It is obvious to a human that the changes to the first method and the entirely new method are entirely different changes. But Git treats them as one, and DOES NOT ALLOW ME TO SPLIT THEM UNDER ANY CIRCUMSTANCES.

Worse than that, the change (according to Git) goes from the middle of the first method to the just before the end of the second method. This makes it impossible to select just specific lines and commit one of the methods. Those lines that git sees as "context" are impossible to select.

If I use git add -p ./path/to/file it no longer has the s option for split in my version of Git (which never worked well anyway) but it has e for edit, but that will not allow adding the final end of the second method. So basically Git offers me absolutely no way of selecting the changes intelligently and adding them separately in separate commits.

Likewise in VS Code, I can select line-by-line from the existing lines, but I can't select lines that Git doesn't think of as part of the changed area. (And also I can't differentiate between added lines and removed lines--a change includes the deleted lines invisibly, so if they are actually a part of a different change, I'm out of luck again.)

So there's no way to control this that I can find, unless I change my code just to trick Git into doing the right thing. If I dig into the history to get the line that was deleted in the first method and add it back in, and then remove (temporarily) the line that was added, and save the file, then it will properly recognize what has changed. Of course I have to remember to undo this kludgey solution, and make sure I undo it properly, or I've broken my code. And this is a tedious and really horrible workaround.

I would love it if there was a way to get Git to recognize changes "properly" the way a human would. Until we have AST-based diff algorithms, I'm not expecting this to be available any time soon. So the next best thing would be to have a way to specify what has changed and not leave it up to Git to guess. Is there any way to do that?

For example (this would be just one way to partially solve the problem), if I could tell Git to NEVER EVER EVER EVER let a diff chunk span an empty line, I would solve this particular example. If I have a chunk that I want to span an empty line, I'm happy to add both chunks separately. Git should always treat them as distinct changes. But that's just one example, and not the basic question.

The basic question is:

If Git can't properly recognize what has changed, how can I force it to accept my version of what has changed? (Without resorting to tedious & error-prone kludges like manually undoing some changes by digging into git history to undo one of the changes so it won't erroneously group two separate things together!)

iconoclast
  • 21,213
  • 15
  • 102
  • 138
  • Wait, since when did `s` not work well? It simply splits on *every* block of common lines, instead of every *large* block of common lines like it does by default. `e` should work very well, just delete the part that doesn't belong and leave the correct line of context ... – o11c Jan 15 '19 at 03:46
  • 1
    "but that will not allow adding the final `end` of the second method". I don't think we can add the `end` since it's not changed. – ElpieKay Jan 15 '19 at 04:07
  • @ElpieKay: (from the standpoint of any human) the second `end` is totally new—it's the first `end` that hasn't changed. that's the whole point. I want to force a sensible (human) interpretation on Git when it's not able to automatically chunk things properly – iconoclast Jan 15 '19 at 04:48
  • @o11c: but `e` won't let me add lines that Git thinks are just context. that's the problem with `e`. I don't know why `s` was not available in this case... perhaps it sometimes is... but I remember it always being frustrating in the past anyway because it would almost always not split things up as far as I wanted them – iconoclast Jan 15 '19 at 04:49
  • `git diff` uses `myers` diff algorithm by default. See http://blog.robertelder.org/diff-algorithm/ and https://stackoverflow.com/a/42741558/6330106. The algorithm is far beyond my knowledge but I believe the diff behaviour was intentionally designed due to its unique merits. – ElpieKay Jan 15 '19 at 05:39
  • What version of Git are you using? – VonC Jan 15 '19 at 05:49
  • @VonC: version 2.20.1 – iconoclast Jan 15 '19 at 18:03
  • @ElpieKay: I realize the algorithm might be ideal for Git's internal purposes, to handle the difference between commits, but I find it inappropriate for the intermediate step of presenting diffs to a human who then chooses what portion of the change to stage for the next commit. Thanks for the link. I'll follow up on it when I'm not at work. – iconoclast Jan 15 '19 at 18:06
  • Git does have a heuristic algorithm for "sliding" a diff window up to find a blank line, which often does the trick. The place it never works as at the *top* of a file as there's no blank line above the first `def ...`. Depending on your source language, you can enforce a "comment, then blank line, then first `def`" rule to help this out, but in general there's no satisfactory answer here: Git is finding *a* minimal edit, not *the correct* edit (minimal or not). – torek Jan 15 '19 at 18:40
  • @torek: by 'enforce a "comment, then blank line, then first def" rule to help this out', do you mean just always ensure the there is a blank line and a comment before every method in my code? Or do you mean enforce some rule in Git's configuration or the way I use certain commands? I thought about the comment-between-methods solution... if it comes to that I guess I might do it... but it's not ideal... but in my case the earlier method was not at the top of the file. (Actually, they pretty much never are in Ruby.) So maybe I didn't understand what you said about Git's algorithm for sliding – iconoclast Jan 15 '19 at 19:40
  • See the question to which [this answer by VonC](https://stackoverflow.com/a/41665589/1256452) details the heuristic. – torek Jan 16 '19 at 05:31

0 Answers0