4

I'm looking for a command line program (Windows) or a PHP library that can handle UTF-8 characters.

I've searched SO and I have read these questions:

but with no luck.

Thanks for help!

Community
  • 1
  • 1
MartyIX
  • 27,828
  • 29
  • 136
  • 207
  • I Googled for "php diff" and "php inline diff" and found several implementations, like http://www.pmwiki.org/wiki/Cookbook/InlineDiff – N3dst4 Nov 05 '11 at 09:18
  • Dave: I tried diff and it doesn't work correctly with utf-8. Does it for you? – MartyIX Nov 05 '11 at 09:52
  • 2
    @MartyIX: The standard *diff* program works perfectly well for UTF-8. It doesn’t do normalization, of course, so if that’s what you need, run everything through an NFC or NFD filter first and compare only the normalized forms instead. That might be a better approach anyway. – tchrist Nov 05 '11 at 16:33
  • 2
    @MartyIX Define "doesn't work correctly"; I've never had an issue. See tchrist's comment. – Dave Newton Nov 05 '11 at 16:43
  • tchrist: Aha, I tried it on Windows (the ported GNU Diff) and for UTF8 input it returned some gibberish. I have realized that maybe Windows command line doesn't work well with UTF8. Thanks for your comment! – MartyIX Nov 07 '11 at 18:37
  • @MartyIX: That is not a diff bug. It is a Windows bug — presuming that congentital brain-damage can be deemed a bug. Windows is itself a bug, so install something that actually works with modern text. If you cannot do that, putty back to localhost and set your encoding to UTF-8. Microsoft is tantamount to unusably broken for Unicode, and they go out of their way to keep it that way. – tchrist Nov 09 '11 at 15:04

5 Answers5

2

I've end up with prettydiff.com. It is not either PHP lib or a program but it works for what I need.

MartyIX
  • 27,828
  • 29
  • 136
  • 207
0

Did you try my favorite -- beyond compare?

j0k
  • 22,600
  • 28
  • 79
  • 90
0

As luck would have it...

Our Smart Differencer tools handle a huge variety of input encodings. You can define the input encoding as an environment variable, so if you do a lot of compares you might want to write a little script. (We're moving towards allowing this as a command line switch).

These tools are designed to compare computer langauges, and are langauge specific. There's a version specifically to compare PHP programs.

If all you want is a plain vanilla text diff, this won't be your tool. [This makes me consider the "trivial computer language" consisting of text lines, which this tool would do really well. I'll have go build one of these (really easy with our machinery) just to see what it is like. Stay tuned to this Bat channel.]

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
0

WinMerge can handle UTF-8 (you need to start the Unicode version WinMergeU.exe)

0

ECMerge internal representation of text is UTF8 (NB: I work for ECMerge). It comes we a command line tool ecmerge-cli (on Windows, ecmerge --cli on Unixes). Calling it from PHP should as simple as any other command line tools. It can be scripted and output whatever you need. Of course it can generate HTML/XML diff reports and patches off the shelf. It is succesfully implemented as the base for several diff services behind web servers.

armel
  • 2,497
  • 1
  • 24
  • 30
  • Does it compare normalized forms, or raw forms? I suspect that you need to normalize both versions first to both be either NFD or NFC, because otherwise it will report as different strings that are canonically equivalent, something you seldom if ever want in Unicode. Plus what is wrong with `diff`? It works for me. – tchrist Nov 05 '11 at 16:35
  • it compares raw forms. input filtering can be used to diff normalized forms (using `iconv` for example). `diff` is pure line-oriented, when `ecmerge` can compare more precisely (words/chars), might ignore line comments and so on, it depends on OP needs. – armel Nov 05 '11 at 16:41