6

I noticed that EF's DbSet.Add() is quite slow. A little googling turned up a SO answer that promises up to 180x performance gains:

https://stackoverflow.com/a/7052504/141172

However, I do not understand exactly how to implement IEquatable<T> as suggested in the answer.

According to MSDN, if I implement IEquatable<T>, I should also override Equals() and GetHashCode().

As with many POCO's, my objects are mutable. Before being committed to the database (SaveChanges()), new objects have an Id of 0. After the objects have been saved, the Id serves as an ideal basis for implementing IEquatable, Equals() and GetHashCode().

It is unwise to include any mutable property in a hash code, and since according to MSDN

If two objects compare as equal, the GetHashCode method for each object must return the same value

Should I implement IEquatable<T> as a property-by-property comparison (e.g. this.FirstName == other.FirstName) and not override Equals() and GetHashCode()?

Given that my POCO's are used in an EntityFramework context, should any special attention be paid to the Id field?

Community
  • 1
  • 1
Eric J.
  • 147,927
  • 63
  • 340
  • 553
  • Why do your objects have an id of 0? Why not as JomTois shows in his code example directly assign a Guid to an ID field / Property. That is an ideal basis for your IEquatable you mention yourself. Also TomTom points out methods of assigning an id Range for every client or using -1, -2 and -3 as temporary id's, these "solutions" seem to vercomplicate things. Upon construction you generate a new Guid and are in business. Or do I miss something? – Youp Bernoulli Nov 17 '12 at 19:35
  • @YoupTube: Integers cannot be valueless. Also, integers are 32-bit while GUIDs are 128 bit. That means SQL Server can fit 4x as many IDs in memory using an Integer (key to performance when conducting joins). – Eric J. Nov 19 '12 at 18:07
  • Ah, I understand. But do you have loads of data, traffic, joins and are in need for super high performance? It's always a trade-off...I know. But using guid's shouldn't be much of a problem. And then you can forget about all the complex issues you have due to using integers. – Youp Bernoulli Nov 20 '12 at 11:26
  • @YoupTube: Sure, there are situations where a GUID is a reasonable answer. However, solutions often tend to grow much larger than originally thought so I prefer to err on the side of efficient architecture. In my case the traffic is only a few dozen visitors a day. The *data* that the traffic acts on, though, is huge compared to the size of available RAM (it's a business portal that provides access to big data). – Eric J. Nov 20 '12 at 16:45

3 Answers3

3

I came across your question in search for a solution to the same question. Here is a solution that I am trying out, see if it meets your needs:

First, all my POCOs derive from this abstract class:

public abstract class BasePOCO <T> : IEquatable<T> where T : class
{
    private readonly Guid _guid = Guid.NewGuid();

    #region IEquatable<T> Members

    public abstract bool Equals(T other);

    #endregion

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj))
        {
            return false;
        }
        if (ReferenceEquals(this, obj))
        {
            return true;
        }
        if (obj.GetType() != typeof (T))
        {
            return false;
        }
        return Equals((T)obj);
    }

    public override int GetHashCode()
    {
        return _guid.GetHashCode();
    }
}

I created a readonly Guid field that I am using in the GetHashCode() override. This will ensure that were I to put the derived POCO into a Dictionary or something else that uses the hash, I would not orphan it if I called a .SaveChanges() in the interim and the ID field was updated by the base class This is the one part I'm not sure is completely correct, or if it is any better than just Base.GetHashCode()?. I abstracted the Equals(T other) method to ensure the implementing classes had to implement it in some meaningful way, most likely with the ID field. I put the Equals(object obj) override in this base class because it would probably be the same for all the derived classes too.

This would be an implementation of the abstract class:

public class Species : BasePOCO<Species>
{
    public int ID { get; set; }
    public string LegacyCode { get; set; }
    public string Name { get; set; }

    public override bool Equals(Species other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        return ID != 0 && 
               ID == other.ID && 
               LegacyCode == other.LegacyCode &&
               Name == other.Name;
    }
}

The ID property is set as the primary key in the Database and EF knows that. ID is 0 on a newly created objects, then gets set to a unique positive integer on .SaveChanges(). So in the overridden Equals(Species other) method, null objects are obviously not equal, same references obviously are, then we only need to check if the ID == 0. If it is, we will say that two objects of the same type that both have IDs of 0 are not equal. Otherwise, we will say they are equal if their properties are all the same.

I think this covers all the relevant situations, but please chime in if I am incorrect. Hope this helps.

=== Edit 1

I was thinking my GetHashCode() wasn't right, and I looked at this https://stackoverflow.com/a/371348/213169 answer regarding the subject. The implementation above would violate the constraint that objects returning Equals() == true must have the same hashcode.

Here is my second stab at it:

public abstract class BasePOCO <T> : IEquatable<T> where T : class
{
    #region IEquatable<T> Members

    public abstract bool Equals(T other);

    #endregion

    public abstract override bool Equals(object obj);
    public abstract override int GetHashCode();
}

And the implementation:

public class Species : BasePOCO<Species>
{
    public int ID { get; set; }
    public string LegacyCode { get; set; }
    public string Name { get; set; }

    public override bool Equals(Species other)
    {
        if (ReferenceEquals(null, other))
        {
            return false;
        }
        if (ReferenceEquals(this, other))
        {
            return true;
        }
        return ID != 0 && 
        ID == other.ID && 
        LegacyCode == other.LegacyCode && 
        Name == other.Name;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj))
        {
            return false;
        }
        if (ReferenceEquals(this, obj))
        {
            return true;
        }
        return Equals(obj as Species);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return ((LegacyCode != null ? LegacyCode.GetHashCode() : 0) * 397) ^ 
                   (Name != null ? Name.GetHashCode() : 0);
        }
    }

    public static bool operator ==(Species left, Species right)
    {
        return Equals(left, right);
    }

    public static bool operator !=(Species left, Species right)
    {
        return !Equals(left, right);
    }
}

So I got rid of the Guid in the base class and moved GetHashCode to the implementation. I used Resharper's implementation of GetHashCode with all the properties except ID, since ID could change (don't want orphans). This will meet the constraint on equality in the linked answer above.

Community
  • 1
  • 1
Jon Comtois
  • 1,824
  • 1
  • 22
  • 29
  • 1
    This is an interesting approach. I'll give it a try when I can and share my feedback. I would appreciate any additional insight you have as you work through this problem. – Eric J. Nov 19 '12 at 18:12
1

As with many POCO's, my objects are mutable

But tehy should NOT be mutable on the fields that are the primary key. Per defintiion, or you are in a world of pain database wise anyway later.

Generate the HashCode ONLY on the fields of the primay key.

Equals() must return true IFF the participating objects have the same hash code

BZZZ - Error.

Hashcodes are double. It is possible for 2 objects to have different values and the smae hashcode. A hsahsode is an int (32bit). A string can be 2gb long. You can not mapp every possible string to a separate hashcode.

IF two objects have the same hashcode, they may be diferent. If two objects are the same, they can NOT have different hashcodes.

Where do you get the idea that Equals must return true for objects with the same hashcode?

Also, PCO or not, an object mapped to a database and used in a relation MUST have a stable primary key (which can be used to run the hashcode calculation). An object not having this STIL lshould have primary key (per SQL Server requirements), using a sequence / artificial primary key works here. Again, use that to run the HashCode calculation.

TomTom
  • 61,059
  • 10
  • 88
  • 148
  • You're right, the relationship between Equals and GetHashCode is the other way around "If two objects compare as equal, the GetHashCode method for each object must return the same value" http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx. Regarding the key: Once an object is inserted into the DB, identity and equality are straightforward. In my case, I am instantiating new objects and have not yet called `SaveChanges()`, so all Id's are 0. I'll modify my question based on your comments, but I don't see a solution yet that supports new objects with no PK assigned. – Eric J. Mar 20 '12 at 07:07
  • 1
    use client generated primary keys. GUID#s, a sequence you generate on the client side. Otherwise the usual reference mechanisms are painful. OR - go own ID's. My own ORM year ago used negative numbers (-1, -2, -3) as tempoarry keys, which got replaces on the insert. GThe hashcodes are anyway not legally reusable after commit (objects need to get refreshed) ;) Problem solved. – TomTom Mar 20 '12 at 07:10
  • +1 for mentioning that equal hashcodes don't automatically mean that your objects are the same. – VVS Mar 20 '12 at 07:10
  • Eric missed the last 'F' in IFF ... but he is right ... there will be lots of samples where GetHashCode is the same but Equals won't – Random Dev Mar 20 '12 at 07:11
  • How do you client-generate ID's with a large number of client running? Key collisions are almost guaranteed. GUIDs are 128 bit and much less efficient than a 32-bit int as a primary key. – Eric J. Mar 20 '12 at 07:12
  • Come on, this is standard solved for 15 years - as long as ORMs exist. use High/low approach, client gets a pool of numbers on start which are db guartanteed to be unique. Or use a sequence (sql 2012) or simulate one with your own tables. – TomTom Mar 20 '12 at 07:20
  • So... have clients manage a pool of keys simply to be able to properly implement IEquatable? Or, don't implement IEquatable and take a performance hit? Surely there must be a better way. – Eric J. Mar 20 '12 at 07:30
  • Yes. Do not use POCO. Write your own ORM, then you can use temporary keys like I did. Learn the basics - this is a very compelex item to handle. – TomTom Mar 20 '12 at 08:19
  • So nobody should use EF unless they have written their own ORM first? Hmmm. – Eric J. Mar 20 '12 at 17:57
  • No, but maybe reading up on the topic is not that bad, you know. No ORM will totally hide imoplementation details. People have written a lot of articles about that over the year. – TomTom Mar 20 '12 at 22:12
  • @EricJ. `GUIDs are 128 bit and much less efficient than a 32-bit int as a primary key`- could you explain what you mean by this? – nicodemus13 May 21 '14 at 08:33
  • @nicodemus13: People say "storage is cheap". However, memory is not cheap, and data is getting "big" faster than memory can catch up for many companies. Any DB solution will be relatively fast if most keys are in memory, and relatively slow if most keys are on disk (because they don't fit in memory). A GUID takes 4x more space than an Int32, so you can only fit 1/4 as many GUID keys in a given amount of RAM. More keys on disk = slower database. – Eric J. May 21 '14 at 16:46
0

First thing first: Sorry my lame English :)

As TomTom say, they shouldn't be mutable just because they still not received PK/Id...

In our EF:CF system, we use generated negative id (assigned in base class ctor or, if you use ProxyTracking, in ObjectMaterialized event) for every new POCO. Its pretty simple idea:

public static class IdKeeper
{
  private static int m_Current = int.MinValue;
  private static Next()
  {
    return ++m_Current;
  }
}

MinValue and incremen should be important, because EF will sort POCOs by their PK before committing changes to db and when you use "-1, -2, -3", POCOs are saved flipped, which in some cases (not according to what sort) may not be ideal.

public abstract class IdBase
{
  public virtual int Id { get; set; }
  protected IdBase()
  {
    Id = IdKeeper.Next();
  }
}

If POCO is materialized from DB, his Id will be override with actual PK as well as when you call SaveChanges(). And as bonus, every single "not yet saved" POCO id will be unique (that should come handy one day ;) )

Comparing two POCO with IEquatable (why does dbset work so slow) is then easy:

public class Person
  : IdBase, IEquatable<Person>
{
  public virtual string FirstName { get; set; }

  public bool Equals(Person other)
  {
    return Id == other.Id;
  }
}
Community
  • 1
  • 1
Jan 'splite' K.
  • 1,667
  • 2
  • 27
  • 34