13

It is well-answered on this site that Bram Cohen's patience diff is found in bazaar as the default diff and as an option with git diff, but I am finding it difficult to source an independent standalone program that implements this particular diff algorithm.

For example I'd like to apply patience diff to perforce diffs, and it's pretty clear with the canonical "frobnitz" code example how patience diff is better:

enter image description here

The terminal on the right has invoked the git diff with the --patience flag.

I also have set up the diff-highlight perl script, whose job it is to invert colors on matched-up lines between the first and last different sections of those lines. The left side has an example where this doesn't quite help so much but I'll let it slide because at least there is that semicolon there... Anyway, making improvements to the diff-highlight script is not the subject of this question.

In addition to the question of where to find a standalone patience diff, if anybody knows how to make perforce p4 use an external diff program, that's also something that has to be done.

Steven Lu
  • 41,389
  • 58
  • 210
  • 364
  • What, I use git to run it over two files? that just seems bad. It never did occur to me. I see now that this works quite well, and I'll definitely be using it. But many systems don't have git installed... – Steven Lu Apr 17 '13 at 18:04

6 Answers6

9

It's perhaps not as ideal as I'd like it, but the solution is perfectly good from a practical perspective (and thats a damn good perspective to have).

git diff --no-index --patience file1 file2 does the job. (thanks @StevenPenny)

$P4DIFF variable defines the external diff... we just stuff git diff --patience --no-index into that.

Steven Lu
  • 41,389
  • 58
  • 210
  • 364
4

I took the liberty of porting patience to a somewhat standalone library, it's in C#. It's still early-days for the library. It is mostly a line-by-line port; so it hopefully has most of the stability of the Python one.

Remember that patience only finds the longest common subsequences (in diff terms that means parts of the file that have not changed). You will need to determine the additions and removals yourself.

Also remember that within the Bazaar repository there are also implementations in Python and in C (again, the implementations only solve the LCS problem):

  • C version: seems to value performance over clarity, you won't easily be able to understand the algorithm by reading this. There is also a lot of code-overhead for Python interop.
  • Python version: the reference implementation of the algorithm. Seems to mostly value clarity over performance.

If you need to write your own implementation I would recommend porting the Python version first, then looking at the C implementation for tips on how to speed it up.

There should also be an implementation in the Git repository, but I haven't searched for it.

Jonathan Dickinson
  • 9,050
  • 1
  • 37
  • 60
  • I have since moved on to a character-by-character diff implementation in C++ found [here](http://code.google.com/p/google-diff-match-patch/issues/detail?id=25), along with many (utf8 and binary related) enhancements of my own. I have dubbed it `sift`. Hopefully one day it will be revealed to the world. Thanks for sharing your version though! (Mine might actually fail w.r.t. doing a true patience-diff, though it behaves well on the frobnitz example) – Steven Lu Aug 06 '13 at 03:07
  • @StevenLu pleasure, glad you came right. I would be interested to hear about what the effects are of some of the improvements you made (if you opened up issues on the GitHub project). – Jonathan Dickinson Aug 06 '13 at 09:50
  • I actually just discovered this: https://github.com/leutloff/diff-match-patch-cpp-stl Now what I've got is based off the slightly older version based on `wstring` and I was unable to properly convert it to use `string`. However I don't consider that much of a limitation as it does fully support utf-8 (and I am not sure how well a non wide `string` implementation could handle utf8 in a way that is aware of the code points). – Steven Lu Aug 06 '13 at 14:14
3

Cohen's own Python implementation needs only minor tweaks (below) to run standalone. It's in two files, copies of which I snagged by googling "difflib patience":

http://stuff.mit.edu/afs/athena/system/i386_deb50/os/usr/share/pyshared/bzrlib/patiencediff.py and http://stuff.mit.edu/afs/athena/system/i386_deb50/os/usr/share/pyshared/bzrlib/_patiencediff_py.py

The first file can be run from the command line roughly like diff. The second is the Python implementation of the inner loops. (Single file?? Exercise for reader!) In bzrlib there's also a C implementation of the inner loops.

Here (with the help of the program itself) are my patches to make them run standalone:

Sandy$ patiencediff.py --patience orig/patiencediff.py patiencediff.py
--- orig/patiencediff.py
+++ patiencediff.py
@@ -15,14 +15,20 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

+try:
+    from bzrlib.lazy_import import lazy_import
+    lazy_import(globals(), """
+    import os
+    import sys
+    import time
+    import difflib
+    """)
+except:
+    import os
+    import sys
+    import time
+    import difflib

-from bzrlib.lazy_import import lazy_import
-lazy_import(globals(), """
-import os
-import sys
-import time
-import difflib
-""")


 __all__ = ['PatienceSequenceMatcher', 'unified_diff', 'unified_diff_files']
@@ -135,11 +141,18 @@
         PatienceSequenceMatcher_c as PatienceSequenceMatcher
         )
 except ImportError:
-    from bzrlib._patiencediff_py import (
-        unique_lcs_py as unique_lcs,
-        recurse_matches_py as recurse_matches,
-        PatienceSequenceMatcher_py as PatienceSequenceMatcher
-        )
+    try:
+        from bzrlib._patiencediff_py import (
+            unique_lcs_py as unique_lcs,
+            recurse_matches_py as recurse_matches,
+            PatienceSequenceMatcher_py as PatienceSequenceMatcher
+            )
+    except ImportError:
+        from _patiencediff_py import (
+            unique_lcs_py as unique_lcs,
+            recurse_matches_py as recurse_matches,
+            PatienceSequenceMatcher_py as PatienceSequenceMatcher
+            )


 def main(args):
Sandy$ patiencediff.py --patience orig/_patiencediff_py.py _patiencediff_py.py
--- orig/_patiencediff_py.py
+++ _patiencediff_py.py
@@ -15,11 +15,16 @@
 # along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

-
+from __future__ import print_function
 from bisect import bisect
 import difflib

-from bzrlib.trace import mutter
+try:
+    from bzrlib.trace import mutter
+except:
+    import sys
+    def mutter(msg):
+        print (msg, file=sys.stderr)


 __all__ = ['PatienceSequenceMatcher', 'unified_diff', 'unified_diff_files']
Sandy$
FutureNerd
  • 769
  • 7
  • 9
  • This is perfect - thank you! I don't see the patched standalone version on tiac or GitHub - have you posted it anywhere? If not, I will when I get the chance :) . – cxw Aug 15 '16 at 12:57
  • I don't still have a copy of the patched file, but I believe you can apply the patch above to the original using the "patch" utility. – FutureNerd Aug 16 '16 at 16:59
  • And I missed part of your question. No, I never did combine them into a single file. – FutureNerd Aug 16 '16 at 17:08
  • There is now a standalone version here: https://github.com/breezy-team/patiencediff – jelmer Dec 28 '19 at 13:09
1

For a javascript implementation, with enhancements to PatienceDiff to determine the lines that likely moved (dubbed "PatienceDiffPlus"), see https://github.com/jonTrent/PatienceDiff.

Trentium
  • 3,419
  • 2
  • 12
  • 19
0

The Bazaar implementation of patiencediff is also available as a separate Python module.

jelmer
  • 2,405
  • 14
  • 27
0

For a Go implementation see patience.

Just a note about my experience implementing the algorithm. Initially I found it very confusing when comparing different implementations. The original Bazaar algorithm appears to calculate the longest common subsequence (LCS) by using Patience sorting to calculate the longest increasing subsequence of unique, common elements. However, there are other methods of calculating the LCS which don't use Patience sorting at all. Most of the Patience Diff implementations I found use these other methods. So even though the complete algorithm is called "Patience Diff," many implementations do not use Patience sorting.

peterevans
  • 34,297
  • 7
  • 84
  • 83