0

I have following class in C# and I'm trying to find a distinct list of items. The list has 24 elements.

public enum DbObjectType
{
    Unknown,
    Procedure,
    Function,
    View
}

public class DbObject
{
    public string DatabaseName { get; set; }
    public string SchemaName { get; set; }
    public string ObjectName { get; set; }
    public DbObjectType ObjectType { get; set; }
}

I have tow approach and expect to get the same result but I don't.

the first expression returns me the same list (includes duplicates)

var lst1 = from c in DependantObject
          group c by new DbObject
          {
              DatabaseName = c.DatabaseName,
              SchemaName = c.SchemaName,
              ObjectName = c.ObjectName,
              ObjectType = c.ObjectType
          } into grp
          select grp.First();

lst1 will have 24 items.

but this one returns the desired result.

var lst2 = from c in DependantObject
          group c by new 
          {
              DatabaseName = c.DatabaseName,
              SchemaName = c.SchemaName,
              ObjectName = c.ObjectName,
              ObjectType = c.ObjectType
          } into grp
          select grp.First();

lst2 will have 10 items.

The only difference is the second expression is anonymous but the first one is typed.

I'm interested to understand this behavior.

Thank you!

I believe my question is not duplicate of mentioned one because: What I'm asking here is not how to get the distinct list. I'm asking why Typed and Anonymous data are returning different result.

FLICKER
  • 6,439
  • 4
  • 45
  • 75

1 Answers1

2

Linq's Distinct() method requires an override of GetHashCode and Equals.

C#'s anoynmous types (the new { Name = value } syntax) creates classes that do override those methods, but your own DbObject type does not.

You can also create a a custom IEqualityComparer type too. Look at StructuralComparisons.StructuralEqualityComparer too.

Option 1:

public class DbObject : IEquatable<DbObject> {

    public override Int32 GetHashCode() {

        // See https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode

        unchecked
        {
            int hash = 17;
            hash = hash * 23 + this.DatabaseName.GetHashCode();
            hash = hash * 23 + this.SchemaName.GetHashCode();
            hash = hash * 23 + this.ObjectName.GetHashCode();
            hash = hash * 23 + this.ObjectType.GetHashCode();
            return hash;
        }
    }

    public override Boolean Equals(Object other) {

        return this.Equals( other as DbObject );    
    }

    public Boolean Equals(DbObject other) {

        if( other == null ) return false;
        return
            this.DatabaseName.Equals( other.DatabaseName ) &&
            this.SchemaName.Equals( other.SchemaName) &&
            this.ObjectName.Equals( other.ObjectName ) &&
            this.ObjectType.Equals( other.ObjectType);
    }
}

Option 2:

class DbObjectComparer : IEqualityComparer {

    public Boolean Equals(DbObject x, DbObject y) {

        if( Object.ReferenceEquals( x, y ) ) return true;
        if( (x == null) != (y == null) ) return false;
        if( x == null && y == null ) return true;

         return
            x.DatabaseName.Equals( y.DatabaseName ) &&
            x.SchemaName.Equals( y.SchemaName) &&
            x.ObjectName.Equals( y.ObjectName ) &&
            x.ObjectType.Equals( y.ObjectType);
    }

    public override Int32 GetHashCode(DbObject obj) {

        unchecked
        {
            int hash = 17;
            // Suitable nullity checks etc, of course :)
            hash = hash * 23 + obj.DatabaseName.GetHashCode();
            hash = hash * 23 + obj.SchemaName.GetHashCode();
            hash = hash * 23 + obj.ObjectName.GetHashCode();
            hash = hash * 23 + obj.ObjectType.GetHashCode();
            return hash;
        }
    }
}

Option 2 usage:

var query = this.DependantObject
    .GroupBy( c => new DbObject() {
        DatabaseName = c.DatabaseName,
        SchemaName   = c.SchemaName,
        ObjectName   = c.ObjectName,
        ObjectType   = c.ObjectType
    } )
    .First();

Using GroupBy might be suboptimal, you could use Linq Distinct directly:

var query = this.DependantObject
    .Select( c => new DbObject() {
        DatabaseName = c.DatabaseName,
        SchemaName   = c.SchemaName,
        ObjectName   = c.ObjectName,
        ObjectType   = c.ObjectType
    } )
    .Distinct()
    .First();
Dai
  • 141,631
  • 28
  • 261
  • 374
  • so, you say when using anonymous, compiler compares the properties but when using typed data, it will look into the GetHashCode and Equals methods? – FLICKER Jun 22 '17 at 17:33
  • @FLICKER - the compiler won't know what exactly constitutes equality in your class. One way to do it, the way I've done it before, is implement `IEqualityComparer` and use an overload of Distinct(). I recently answered a similar problem here - https://stackoverflow.com/questions/44663909/vb-net-linq-filter-table-rows/44664169#44664169 The code is in VB.NET but the concept should be the same. – Fabulous Jun 22 '17 at 17:39
  • @Dai, this is new, I had never tried to use distinct on anonymous types so didn't know this is how they behaved. – Fabulous Jun 22 '17 at 17:41