2

I've written a test method for comparison between two instances of a class (the assumption of type compatibility is given). Proudly I checked all the public properties making sure to return a list of discrepancies.

The problem is that some of the properties are objects containing their own properties (sub-properties, if you will). Those are not being compared, as far I can see by stepping through the process flow.

How can I design a call that goes in on-depth and compares all sub-properties? Extra bonus if the approach is relatively simple. :)

public static class Extensions
{
  public static IEnumerable<string> DiffersOn<Generic>(
    this Generic self, Generic another) where Generic : class
  {
    if (self == null || another == null)
      yield return null;

    Type type = typeof(Generic);
    IEnumerable<PropertyInfo> properties = type.GetProperties(
      BindingFlags.Public | BindingFlags.Instance);

    foreach (PropertyInfo property in properties)
    {
      var selfie = type.GetProperty(property.Name).GetValue(self);
      var othie = type.GetProperty(property.Name).GetValue(another);
      if (selfie != othie && (selfie == null || !selfie.Equals(othie)))
        yield return property.Name;
    }
  }
}
Konrad Viltersten
  • 36,151
  • 76
  • 250
  • 438
  • Such heavy method (comparison with Reflection) usually has not strict performance requirements. In this is true also in your case usually I have an helper method for this: I serialize both (with BinaryFormatter) then I perform a raw byte[] comparison. It's not suitable for evey case (because private fields and public properties are not same thing) but sometimes it's useful. – Adriano Repetti Jul 31 '14 at 15:12
  • BTW with method you have you just need to make your comparison recursive when you find !propertyInfo.PropertyType.IsPrimitive and if it doesn't implement IEquatable/IComparable interfaces. – Adriano Repetti Jul 31 '14 at 15:15
  • Good idea. I'd like to have full control of the result so I'm reporting the actual properties' names as an array. However, there's a certain beauty in the blunt simplicity of your suggestion. Post it as a reply too, please. – Konrad Viltersten Jul 31 '14 at 15:19
  • 1
    There isn't a "simple" way to do this, in the general case. You'll run into problems if two objects refer to each other, either directly or indirectly. To prevent that you must have some way to mark an object as already checked. That's why the serialization approach is a good place to start; serialization code already handles the circular reference case. – Jim Mischel Jul 31 '14 at 15:42
  • @JimMischel I'm not sure I understand. What "*circular case*" are you referring to? Perhaps I was unclear in my description on the compared objects. Both are of type *A* and contain field of type *B* but they **don't** refer to anything of type *A* themselves. – Konrad Viltersten Jul 31 '14 at 19:26
  • http://comparenetobjects.codeplex.com/ – Robert Harvey Jul 31 '14 at 20:57
  • 2
    Assume you have a `Parent` class that contains a field defined as `Child _child`. And the `Child` class contains a field, `Parent _parent`. If you do a naive recursive traversal of the object graph, you'll end up in an endless loop. You'll compare the `_child` fields and find that their references are identical. Then you'll recurse into `_child` and compare `_parent`. And now you're in an infinite loop. – Jim Mischel Jul 31 '14 at 21:30
  • @JimMischel Very good point. I missed that. Luckily, after I've checked, that's not the case in this particular scenario (pfeeew... & yey!) but your point is still very much valid. – Konrad Viltersten Aug 01 '14 at 08:18

1 Answers1

6

As I said in comment easiest way is to use BinaryFormatter to serialize both objects and compare raw byte[] streams. With that you'll compare fields (and not properties) so things may be different (two objects may be compared as logically equal even if their private fields are different). Biggest advantage is that serialization will handle a very tricky case: when objects have circular references.

Roughly something like this:

static bool CheckForEquality(object a, object b)
{
    BinaryFormatter formatter = new BinaryFormatter();

    using (MemoryStream streamA = new MemoryStream())
    using (MemoryStream streamB = new MemoryStream())
    {
        formatter.Serialize(streamA, a);
        formatter.Serialize(streamB, b);

        if (streamA.Length != streamB.Length)
            return false;

        streamA.Seek(0, SeekOrigin.Begin);
        streamB.Seek(0, SeekOrigin.Begin);

        for (int value = 0; (value = streamA.ReadByte()) >= 0; )
        {
            if (value != streamB.ReadByte())
                return false;
        }

        return true;
    }
}

As pointed out by Ben Voigt in a comment this algorithm to compare streams is pretty slow, for a fast buffer comparison (MemoryStream keeps data in a byte[] buffer) see this post he suggested.

If you need more "control" and actually handle custom comparison then you have to make things more complicated. Next sample is first raw (and untested!) version of this comparison. It doesn't handle a very important thing: circular references.

static bool CheckForEquality(object a, object b)
{
    if (Object.ReferenceEquals(a, b))
        return true;

    // This is little bit arbitrary, if b has a custom comparison
    // that may equal to null then this will bypass that. However
    // it's pretty uncommon for a non-null object to be equal
    // to null (unless a is null and b is Nullable<T>
    // without value). Mind this...
    if (Object.ReferenceEquals(a, null)
        return false; 

    // Here we handle default and custom comparison assuming
    // types are "well-formed" and with good habits. Hashcode
    // checking is a micro optimization, it may speed-up checking
    // for inequality (if hashes are different then we may safely
    // assume objects aren't equal...in "well-formed" objects).
    if (!Object.ReferenceEquals(b, null) && a.GetHashCode() != b.GetHashCode())
        return false;

    if (a.Equals(b))
        return true;

    var comparableA = a as IComparable;
    if (comparableA != null)
        return comparableA.CompareTo(b) == 0;

    // Different instances and one of them is null, they're different unless
    // it's a special case handled by "a" object (with IComparable).
    if (Object.ReferenceEquals(b, null))
        return false;

    // In case "b" has a custom comparison for objects of type "a"
    // but not vice-versa.
    if (b.Equals(a))
        return true; 

    // We assume we can compare only the same type. It's not true
    // because of custom comparison operators but it should also be
    // handled in Object.Equals().
    var type = a.GetType();
    if (type != b.GetType())
        return false;

    // Special case for lists, they won't match but we may consider
    // them equal if they have same elements and each element match
    // corresponding one in the other object.
    // This comparison is order sensitive so A,B,C != C,B,A.
    // Items must be first ordered if this isn't what you want.
    // Also note that a better implementation should check for
    // ICollection as a special case and IEnumerable should be used.
    // An even better implementation should also check for
    // IStructuralComparable and IStructuralEquatable implementations.
    var listA = a as System.Collections.ICollection;
    if (listA != null)
    {
        var listB = b as System.Collections.ICollection;

        if (listA.Count != listB.Count)
            return false;

        var aEnumerator = listA.GetEnumerator();
        var bEnumerator = listB.GetEnumerator();

        while (aEnumerator.MoveNext() && bEnumerator.MoveNext())
        {
            if (!CheckForEquality(aEnumerator.Current, bEnumerator.Current))
                return false;
        }

        // We don't return true here, a class may implement IList and also have
        // many other properties, go on with our comparison
    }

    // If we arrived here we have to perform a property by
    // property comparison recursively calling this function.
    // Note that here we check for "public interface" equality.
    var properties = type.GetProperties().Where(x => x.GetMethod != null);
    foreach (var property in properties)
    {
        if (!CheckForEquality(property.GetValue(a), property.GetValue(b)))
            return false;
    }

    // If we arrived here then objects can be considered equal
    return true;
}

If you strip out comments you'll have pretty short code. To handle circular references you have to avoid to compare again and again same tuple, to do that you have to split function like in this example (very very naive implementation, I know):

static bool CheckForEquality(object a, object b)
{
    return CheckForEquality(new List<Tuple<object, object>>(), a, b);
}

With core implementation like this (I rewrite only important part):

static bool CheckForEquality(List<Tuple<object, object>> visitedObjects, 
                             object a, object b)
{
    // If we compared this tuple before and we're still comparing
    // then we can consider them as equal (or irrelevant).
    if (visitedObjects.Contains(Tuple.Create(a, b)))
        return true;

    visitedObjects.Add(Tuple.Create(a, b));

    // Go on and pass visitedObjects to recursive calls
}

Next step is little bit more complicate (get the list of different properties) because it may not be such simple (for example if two properties are lists and they have different number of items). I'll just sketch a possible solution (removing code for circular references for clarity). Note that when equality breaks then subsequent checks may also produce unexpected exceptions so it should be implemented much better than this.

New prototype will be:

static void CheckForEquality(object a, object b, List<string> differences)
{
     CheckForEquality("", a, b, differences);
}

And implementation method will also need to keep track of "current path":

static void CheckForEquality(string path,
                             object a, object b, 
                             List<string> differences)
{
    if (a.Equals(b))
        return;

    var comparableA = a as IComparable;
    if (comparableA != null && comparableA.CompareTo(b) != 0)
        differences.Add(path);

    if (Object.ReferenceEquals(b, null))
    {
        differences.Add(path);
        return; // This is mandatory: nothing else to compare
    }

    if (b.Equals(a))
        return true;

    var type = a.GetType();
    if (type != b.GetType())
    {
        differences.Add(path);
        return; // This is mandatory: we can't go on comparing different types
    }

    var listA = a as System.Collections.ICollection;
    if (listA != null)
    {
        var listB = b as System.Collections.ICollection;

        if (listA.Count == listB.Count)
        {
            var aEnumerator = listA.GetEnumerator();
            var bEnumerator = listB.GetEnumerator();

            int i = 0;
            while (aEnumerator.MoveNext() && bEnumerator.MoveNext())
            {
                CheckForEquality(
                    String.Format("{0}[{1}]", path, i++),
                    aEnumerator.Current, bEnumerator.Current, differences);
            }
        }
        else
        {
            differences.Add(path);
        }
    }

    var properties = type.GetProperties().Where(x => x.GetMethod != null);
    foreach (var property in properties)
    {
        CheckForEquality(
            String.Format("{0}.{1}", path, property.Name),
            property.GetValue(a), property.GetValue(b), differences);
    }
}
Community
  • 1
  • 1
Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • 1
    I'd definitely use `GetBuffer` rather than so many calls to `ReadByte`. For comparing the buffers quickly, see http://stackoverflow.com/a/1445405/103167 – Ben Voigt Jul 31 '14 at 19:40
  • @BenVoigt you're absolutely right. Thanks also for the link...actually...I have to admit I never rhought to P/Invoke CRT!!! – Adriano Repetti Jul 31 '14 at 21:18
  • 1
    `Buffer` class needs a `BlockCompare`, in the same way that `Buffer.BlockCopy` provides `memmove` behavior. – Ben Voigt Jul 31 '14 at 21:32
  • Using serialization for this purpose is a stroke of genius! +1 very impressive! :) – jimjim Jul 01 '15 at 06:34
  • Not really. It's so d* slow (especially if you don't implement ISerializable) that you can't use it in many scenarios. Moreover binary equivalence a logical equivalence are different. In short: you can use it just sometimes – Adriano Repetti Jul 01 '15 at 06:37