Basic string-diff for C++ test case

Question

I have a C++ function which returns a multi-line std::string. In the test-case for this, I compare each line against the known-value - something like:

std::string known = "good\netc";
std::string output = "bad\netc";

std::vector<std::string> knownvec;
pystring::splitlines(known, knownvec); // splits on \n

std::vector<std::string> outvec;
pystring::splitlines(output, outvec);

CHECK_EQUAL(osvec.size(), resvec.size());

for(unsigned int i = 0; i < std::min(outvec.size(), knownvec.size()); ++i)
    CHECK_EQUAL(pystring::strip(outvec[i]), pystring::strip(knownvec[i]));

This works, but say a single new-line is added, all subsequent CHECK_EQUAL assertions fail, which is make the output hard to read

Is there a better way to compare the two strings, ideally in a nice, self-contained way (i.e not linking against giantdifflib, or writing the strings to a file and calling the diff command!)

[Edit] I'm using OpenImageIO's rather simple unittest.h

The data being compared is mainly either YAML, or colour lookup tables. Here's an example test case - basically a few lines of headers, then lots of numbers:

 Version 1
 Format any
 Type ...
 LUT:
 Pre {
   0.0
   0.1
   ...
   1.0
 }
 3D {
   0.0
   0.1
   ...
   1.0
 }

If a single string match fails, do you want to "resynchronize" the string matching further down like `diff` does? — Emile Cormier, Jul 26 '11 at 15:33
He's using UnitTest++, which is awesome. As an aside, you shouldn't omit scope brackets around a macro. Use for (;;) { CHECK_EQUAL(...); } instead of for (;;) CHECK_EQUAL(...); — Tom Kerr, Jul 26 '11 at 16:12
what i think is if you try the same trick we use to calculate the longest common subsequence(LCS) between two strings you can compare two strings in the way you want. You can then call a function object at each instance of mismatch(to handle each case of a mismatch) — A. K., Jul 26 '11 at 18:23
@Emile I'm using [this unittest.h](https://github.com/OpenImageIO/oiio/blob/master/src/include/unittest.h) - nothing fancy, but it works — dbr, Jul 27 '11 at 00:28

score 1 · Accepted Answer · edited May 23 '17 at 12:11

The easiest thing to do would be to break out of your loop when strings no longer match:

for(unsigned int i = 0; i < std::min(outvec.size(), knownvec.size()); ++i)
{
    bool areEqual = pystring::strip(outvec[i]) == pystring::strip(knownvec[i]);
    CHECK_EQUAL(pystring::strip(outvec[i]), pystring::strip(knownvec[i]));
    if (!areEqual)
        break;
}

If CHECK_EQUAL returns a boolean value, then you can obviously simplify the above example a bit.

If want your unit test framework to provide the same output as diff when comparing multi-line strings, then I'm afraid you're expecting too much out of your unit test framework. If you don't want to link to an external library, or execute diff from within your test program, then you'll have to program some kind of diff algorithm yourself.

Check out this other question about information on diff algorithms and libraries.

If you find that implementing a diff algorithm yourself is not worth the trouble (it probably isn't), then check out the Google Diff-Match-Patch libraries.

Tom Kerr · Answer 2 · 2011-07-26T18:10:05.457

Short:

For the purposes of unit testing, you just need to flag that they are different. Unit tests don't fix failing unit tests, programmers fix failing unit tests.

Long:

If your sequence sizes are possibly different, there isn't a simple, generic way to compare them. I think you'll need a giantdifflib to do it poorly, let alone adequately.

I think if you can't say that the ordinal is not an identity, then you are going to have to use search to add information.

Consider this degenerative case:

a b c d e f
d e f a b c

Whether or not you choose either one of these solutions is going to come down to scoring the results or some artifact of the implementation:

      a b c d e f
d e f a b c

a b c d e f
      d e f a b c

My opinion is that If you have to assign a score to a result, then it is unlikely that a unit test is applicable.

Comparing containers isn't very easy in general, if the result cannot be lexicographically sorted, I'm not sure that any computational result will be informative beyond telling you that its different.

This is a fun problem to think about obviously, but it is probably out of scope of unit testing.

score 0 · Answer 3 · answered Jul 26 '11 at 16:02

0

A basic diff algorithm is rather easy to implement, if not terribly efficient. This Wikipedia article is a good starting place.

answered Jul 26 '11 at 16:02

n. m. could be an AI

112,515
14
128
243

Basic string-diff for C++ test case

3 Answers3