3

I have the given listitem class:

class Vector
{
    public int Column { get; set; }
    public int Row { get; set; }
    public int TableID { get; set; }

    public Vector(int column, int row, int tableID)
    {
        TableID = tableID;
        Row = row;
        Column = column;
    }
}

Later I have a typed list of this items, and I want to find out if a given vector (column,row,table) is already added to this list. The trivial solution of course:

    var items = new List<Vector>();
    items.Add(new Vector(1, 2, 3));
    items.Add(new Vector(5, 6, 7));

    for (int i = 0; i < 1000; i++)
    {
        if (items.Any(e => e.Column == 1 && e.Row == 2 && e.TableID == 3))
        {
            // do something
        }
    }

Yes it is working, but... I'm afraid as more and more items in the list it will exponential slower, as you have to enumerate all the items to find a matching one.

Finally my question is:

Can you recommend other data structure to allow "fast contains"? I mean at least linear algorithm. Any will do, I need only store 3 related int and check the containment later.

Laszlo Boke
  • 1,319
  • 1
  • 10
  • 22

5 Answers5

5

You can implement IEquatable<T> interface for your class(methods public bool Equals(T other) and public override int GetHashCode()) And use a HashSet to store unique items:

class Vector :  IEquatable<Vector>
{
    /*Some fields and methods*/

    public bool Equals(Vector other)
    {
        if (ReferenceEquals(other, null)) return false;

        if (ReferenceEquals(this, other)) return true;

        return Column.Equals(other.Column) && Row.Equals(other.Row) && TableID.Equals(other.TableID);
    }

    public override int GetHashCode()
    {
        return Column.GetHashCode() ^ Row.GetHashCode() ^ TableID.GetHashCode();
    }
}

and using hashset:

var set = new HashSet<Vector>();
    var vect = new Vector { ... };
set.Add(vect);
Sergio
  • 6,900
  • 5
  • 31
  • 55
  • 3
    That GetHashCode() implementation will result in many collisions. Table 1, Column 1, Row 2; Table 1, Column 2, Row 1; and Table 2, Column 1, Row 1 would all have the same hash code. That will greatly reduce the efficiency of the HashSet. – MattW Mar 13 '13 at 13:33
  • That's just an example of usage. basically you can implement any logic in GetHashCode, depending on data you're going to store. Anyway after GetHashCode returned a collision Equals is being called – Sergio Mar 13 '13 at 13:44
  • 1
    @Sergio Fair enough, and that example will work. That's almost the problem though - in this case the average lookup could take as much as three times as long as it should, and even a good developer might never think to look into why. – MattW Mar 13 '13 at 14:35
2

Can you recommend other data structure to allow "fast contains"?

Since all vectors must be unique, you could use a HashSet<Vector> and implement the appropriate methods GetHashCode and Equals:

class Vector 
{
    public int Column { get; set; }
    public int Row { get; set; }
    public int TableID { get; set; }

    public Vector(int column, int row, int tableID)
    {
        TableID = tableID;
        Row = row;
        Column = column;
    }

    public override int GetHashCode()
    {
        unchecked 
        {
            int hash = 17;
            hash = hash * 23 + Column.GetHashCode();
            hash = hash * 23 + Row.GetHashCode();
            hash = hash * 23 + TableID.GetHashCode();
            return hash;
        }
    }

    public override bool Equals(object obj)
    {
        if (obj == null || !(obj is Vector)) return false;
        Vector v2 = (Vector)obj;
        return Column == v2.Column && Row == v2.Row && TableID == v2.TableID;
    }
}

This should be fast enough in my opinion.

HashSet<Vector> items = new HashSet<Vector>();
bool isNew = items.Add(new Vector(1, 2, 3));
isNew = items.Add(new Vector(5, 6, 7));
isNew = items.Add(new Vector(5, 6, 7)); // false
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • 1
    That GetHashCode() implementation suffers from the same collision problems I mentioned in Sergio's answer. – MattW Mar 13 '13 at 14:42
  • @TimSchmelter No one mentioned, but "Vector" type would be a struct, wouldn't? Would be a performance bonus? – Laszlo Boke Mar 14 '13 at 07:16
  • @xeondev: That depends on whether or not you want to change it's values. Structs are immutable, when you want to change their values you have to create a new one which is expensive in terms of memory usage. I would use a class in most cases. Have a look [here](http://stackoverflow.com/a/441323/284240) and [here](http://stackoverflow.com/questions/521298/when-to-use-struct-in-c). – Tim Schmelter Mar 14 '13 at 08:32
1

This sounds close to the perfect use-case for System.Collections.Generic.HashSet (if you're using .Net 4.0 or later).

You'd need to implement IEquatable on your class, and be a little careful about your GetHashCode implementation because a simplistic xor of the three components will likely result in a lot of hash collisions, for example row 1 column 2 and row 2 column 1 in the same table would always collide; look at the CRC32 algorithm for hints on how to do it better.

Alternatively, a quick-and-dirty way to achieve the same result would be to make your Vector inherit from Tuple<int, int, int> and just have the friendly named properties be proxies for Item1, Item2 and Item3 - Microsoft have already worried about implementing a good hash.

MattW
  • 4,480
  • 1
  • 12
  • 11
  • +1 "_Make your `Vector` inherit from `Tuple` ... Microsoft have already worried about implementing a good hash_". Nothing "dirty" about that, IMHO. – Sepster Mar 13 '13 at 13:55
0

One way would be to construct a key or hash from the values and use that to store the vector in a hashtable.

Another way is to sort the array and then use a binary method as contains, that gives you log(n) instead of linear n for the contains method.

dutt
  • 7,909
  • 11
  • 52
  • 85
0

You can try using hash tables, if implemented correctly the access time is constant (in perfect world) or use an ordered binary tree, the max number of steps to find a value is log base 2 n where n is the number of elements, and the result of the logarithm is rounded up, in real life its most of the time less steps than the log result, this is true provided that this is implemented correctly and you have a balanced binary tree

Hash tables are faster but harder to implement than binary trees, so its up to you

Daniel Wardin
  • 1,840
  • 26
  • 48