Can switching diff algorithm in Git cause any problems?

Question

I would like to switch to either the patience or histogram algorithm in Git, but I'm wondering if there are any side effects for a given repo not being consistent in its use of algorithm. If I switch, will that cause anything to break when I deal with commits that were added prior to the algorithm switch? Will it be a problem if other developers don't use the same algorithm?

I can't think of a specific scenario where there would be a conflict, but it seems like a pretty fundamental change, so I'd to look before I leap nonetheless.

IIRC the diffing algorithm doesn't affect how git stores the files or even any of the automatic conflict resolution. It's basically just how git visualizes a difference to the user. — Joachim Sauer, Aug 15 '20 at 20:07
There are several things in the diff-config man docs (https://git-scm.com/docs/diff-config) that say "Note that this affects only 'git diff' Porcelain", but `diff.algorithm` does not have that note. So I'm _fairly_ confident that this changes more than just the visualization to the user. But of course I'm prepared to be corrected if I'm wrong . — iconoclast, Aug 15 '20 at 20:15
@iconoclast I'd suggest asking in the Blender mailing list if that an omission or not and otherwise clarify it. — Acorn, Aug 15 '20 at 20:20
Why the _Blender_ mailing list? just because there are super-smart Git users writing Blender?? Wouldn't the Git mailing list be a much better place? — iconoclast, Aug 15 '20 at 20:35
The merge strategies are responsible for invoking the internal diff code in the first place, and the only ones that actually do so (as built into Git) invoke it without letting you change the algorithm. If you write your own merge strategy, you can make it do whatever you like, but writing a merge strategy is a major undertaking. — torek, Aug 16 '20 at 03:18

score 4 · Answer 1 · answered Aug 16 '20 at 14:52

The diff algorithm you use is in effect from when you set the setting, so it will affect whatever operations are in use at the time. Changing the diff algorithm doesn't have any negative effects explicitly: any diff algorithm will produce an equivalent diff, but the question is how easy it is for folks to read. Patience and histogram are usually better, but not always.

The only time you might have a problem is if you're storing diffs in some system or repository (such as files generated by git format-patch), which isn't very common but is used in some Linux distribution packaging workflows. In such a case, if different people use different diff algorithms, you'll see a lot of diff noise as the patches are regenerated between users, even though the diffs are logically equivalent.

If you have such a case, it's better to just force some fixed diff algorithm with your tooling, which is what I've done in the past. That would look like having your tool run git -c diff.algorithm=myers format-patch.

Beyond that case, there's really no harm in changing the diff algorithm if you find you like something other than the default better.

score 2 · Answer 2 · answered Aug 15 '20 at 21:37

2

No,

it will not break anything. The diffs are always calculated after the fact. You can either change the diff algorithm permanently via config or temporarily via option flags on the command line.

Git does not store diffs, all history is stored as (full) snapshots of tree objects. A tree always points to full files ("blobs" in Git terminology) or subdirectories (represented by other tree objects).

answered Aug 15 '20 at 21:37

knittl

246,190
53
318
364

1

"...all history is stored as (full) snapshots..." No, it isn't. It's true that the stored representation doesn't depend on the diff algorithm used to present text patches, but it absolutely does use deltas. – Mark Adelsberger Aug 16 '20 at 02:48
1

@MarkAdelsberger: deltas appear only in packfiles, which exist below the object level. Philosophically this is similar to wondering if a file is compressed, when it's stored on ZFS with ZFS-level compression turned on. In one sense, it *is* compressed, because ZFS compressed each block. But when you open and read the file, you can't *tell* that it's compressed, especially if it was just moved to a different dataset in which compression is not enabled. – torek Aug 16 '20 at 03:16
@MarkAdelsberger Git object model stores full snapshots only. Each commit references one single root tree and this root tree then references all files and all subdirectories in full. Pack files use clever compression algorithms to be more space efficient, but this delta compression does neither use (human-readable) "diff"s, nor is it affected by the configured diff algorithm. […] – knittl Aug 16 '20 at 07:14
[…] This happens on a different level, comparable to the different layers of the OSI model. An HTTP request or response is a single entity, but on lower levels it might be fragmented. Not something you have to think or care about when talking HTTP, because the underlying layers will handle this transparently. – knittl Aug 16 '20 at 07:15
@torek That is a nitpick. The statement in the answer was that only complete snapshots are stored, and that is not true. – Mark Adelsberger Aug 16 '20 at 15:48
@knittl You can split all the hairs you want; the statement "git does not store diffs" is incorrect. You qualified it with "human readable" this second time around, but that's not what you said originally. – Mark Adelsberger Aug 16 '20 at 15:49
@MarkAdelsberger and I beg to differ. Git uses delta compression when using packfiles. It does not store them as diffs. diffs != delta compression. But that's besides the point. The question was if changing the diff algorithm can make a repository or its commits incompatible with other repositories and the answer to that is a simple and direct "no, it cannot break compatibility" – knittl Aug 16 '20 at 15:53
1

@knittl diff != delta compression but delta compression is a type of diff. And neither one of them is a "(full) snapshot". You think it's beside the point becuase you only care if your statement is "correct enough" to explain the behavior asked about; and that is where I differ, because I care that these "correct enough if you squnt at them hard enough" statements lead people to believe that git is too complicated to understand when they try to reason using them and get incorrect results. – Mark Adelsberger Aug 16 '20 at 16:01

score 1 · Answer 3 · answered Aug 15 '20 at 21:40

Looking at the evolution of both histogram and patience diffs, there is no side-effect for past commits.

There are effects only for the git diff command itself (or diff-based operation like log -p).
For instance, a git diff --histogram done before Git 2.1 would trigger too many memory allocation.

Can switching diff algorithm in Git cause any problems?

3 Answers3

No,