-4

I need to find rowы in a DataTable by keys very fast. For that i convert data table to dictionary of type <Tuple<key1, key2, ..., keyN>, Datarow> and can seek the record very fast like

var result = myDict[Tuple("aaa", 123, ...)];

Is the .Find() method faster? How does it work? I know i could just make a few samples and try them but if someone know how does this method works internally then it could help me in future. Thank you!

LINQ2Vodka
  • 2,996
  • 2
  • 27
  • 47
  • 4
    You can find a lot of info an benchmarks here: https://msdn.microsoft.com/en-us/library/dd364983.aspx – deramko Aug 14 '15 at 15:53
  • do a google search on the following `C# MSDN DataTable.Select()` Method there is no need to convert this into a Dictionary – MethodMan Aug 14 '15 at 15:54
  • @MethodMan, i do almost same way, the question is in speed of search – LINQ2Vodka Aug 14 '15 at 15:58
  • Maybe look at this: http://stackoverflow.com/questions/10382343/linq-to-datatable Are you querying the "converted datatable" once? Or many, many times? – granadaCoder Aug 14 '15 at 16:02
  • @granadaCoder I work with a hundreds data tables and often use some of them as a look-up source for others. I have to deal with DataTables but looking for fastest way to look-up in them. Currently i convert them to dictionaries and can see it works pretty fast. – LINQ2Vodka Aug 14 '15 at 16:06
  • Consider converting the rows in a DataTable into a Poco object.....and then looking at these discussions: http://stackoverflow.com/questions/1009107/what-net-collection-provides-the-fastest-search – granadaCoder Aug 14 '15 at 17:45

2 Answers2

1

A Dictionary uses a hash table lookup so for single key it is going to be about as fast as you can get.

As you know Dictionary has a single key so you use a Tuple to pack multiple values into that key. The Tuple must be unique to be a key - since you are doing this today that must me the case.

Be aware that Tuple can can suffer from a lot of hash collisions. You can test by just running a GetHashCode on all the keys and see how many collision. Put the most unique values early in the tuple.

If you have a good hash then Dictionary is O(1)

DataRowCollection.Find is documented. Lookup (Find) is O(log n). Why do you need to know how it works internally? Test it out. Consider the time it takes to build the Dictionary.

If your are using relations between Data Table you should be using DataRelation.

paparazzo
  • 44,497
  • 23
  • 105
  • 176
-1

Using the select method is slow if you are going to do lots of searching. A dictionary uses a hash table to perform lookups and will be the fastest method. Try code like below

            Dictionary<Tuple<string,string,string>,List<DataRow>>  myDict1 = dt.AsEnumerable()
                .GroupBy(x => new Tuple<string, string, string>(x.Field<string>("Col A"), x.Field<string>("Col A"), x.Field<string>("Col A")), y => y)
                .ToDictionary(x => x.Key, y => y.ToList());

           //if only one value per key then use this
            Dictionary<Tuple<string, string, string>, DataRow> myDict2 = dt.AsEnumerable()
                .GroupBy(x => new Tuple<string, string, string>(x.Field<string>("Col A"), x.Field<string>("Col A"), x.Field<string>("Col A")), y => y)
                .ToDictionary(x => x.Key, y => y.FirstOrDefault());​
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • 1
    The question was about Find vs. Dictionary. – LINQ2Vodka Aug 14 '15 at 16:17
  • I don't think find uses a hash so dictionary will be faster for large searches. Find average time will be N/2 while dictionary will be log[N]. – jdweng Aug 14 '15 at 16:33
  • "Think" don't post an answer based on what you think. Data Table Find is 0(log n) and a Dictionary with a good hash is O(1). – paparazzo Aug 14 '15 at 16:41
  • "DataTable.Rows.Find, on the other hand, returns only a single row. Essentially, when you specify the primary key, a binary tree is created" - from here https://msdn.microsoft.com/en-us/library/dd364983.aspx – LINQ2Vodka Aug 14 '15 at 16:45
  • Article referrers to DataTables using ADO.Net that are indexed. May not apply in all cases. Find is fast only if indexed table is retained after each search. – jdweng Aug 14 '15 at 18:28