Most efficient collection for storing data from LINQ to Entities?

Question

I have read several different sources over the years that indicate that when storing a collection of data, a List<T> is efficient when you want to insert objects, and an IEnumerable<T> is best for enumerating over a collection.

In LINQ-to-Entities, there is the AsEnumerable() function, that will return an IEnumerable<T>, but it will not resolve the SQL created by the LINQ statement until you start enumerating over the list.

What if I want to store objects from LINQ to Entities in a collection and then query on that collection later?

Using this strategy causes the SQL to be resolved by adding a WHERE clause and querying each record separately. I specifically don't want to do that because I'm trying to limit network chatter:

var myDataToLookup = context.MyData.AsEnumerable();

for(var myOtherDatum in myOtherDataList)
{
    // gets singular record from database each time.
    var myDatum = myDataToLookup.SingleOrDefault(w => w.key == myOtherDatum.key)        
}

How do I resolve the SQL upfront so myDataToLookup actually contains the data in memory? I've tried ToArray:

var myDataToLookup = context.MyData.ToArray();

But I recently learned that it actually uses more memory than ToList does: Is it better to call ToList() or ToArray() in LINQ queries?

Should I use a join instead?

var myCombinedData = from o in myOtherDataList
                     join d in myDataToLookup on
                         o.key equals d.key
                     select { myOtherData: o, myData: d};

Should I use ToDictionary and store my key as the key to the dictionary? Or am I worrying too much about this?

What do you want to do with your collection? If you're not adding to it then `.ToList()` is probably OK. A dictionary is useful if you need to have access to any element, etc. — ChrisF, Feb 13 '17 at 21:35
In this case, I'm just using it for looking up data necessary for processing in a for loop of other data. Note I don't have a navigation property available in this instance. So I'm looping on 1 collection, and looking for data in a second collection based on a key. That's it. — scott.korin, Feb 13 '17 at 21:37
In that case a dictionary keyed off some aspect of the data in the other loop - if that's possible - will probably be your best bet. — ChrisF, Feb 13 '17 at 21:39
Do you mean that after having retrieved `myDataToLookup`, you want to query over it multiple times and in multiple shapes? If you need it only for one query the question is how to query the data from the database in the most efficient way. — Gert Arnold, Feb 13 '17 at 21:49
i have pursued this as far as i ever care to, more than once, and it's the right question to ask, but only once you specify one more thing: what portion of the collection will you actually access during runtime? If you only access a "small" percentage, your c# method of retrieving each record by key with an explicit query PER row MAY STILL BE THE BEST out of the options you're considering. However, if you access EVERY element of the collection 100x, this most definitely is NOT the fastest way. Specifying your usage patterns AND actually measuring is the only way to know for sure anyway. — SlimsGhost, Feb 13 '17 at 22:53
@SlimsGhost Every object in the collection gets accessed once. This is actually a generic question related to two or three different loops I'm using, currently in the code I'm currently looking at. In general, this is the type of thing the software I am working on does a lot, in many different functions (not the same data, but the same basic premise). — scott.korin, Feb 14 '17 at 01:06
@scott.korin, ok, roger that. Then I agree that row-by-row is probably less than optimal, maybe a lot less. If you think you are eventually going to "materialize" all the data, couldn't you just "go big" and implement this step as a new "standard" layer of abstraction for your project(s), using something like DTO's and AutoMapper to get to a List basis (or Array or whatever), whose object model will conform to the desired logical business model of your app layer? I've had positive first-hand experience doing things this way before, for what it's worth. Good luck! — SlimsGhost, Feb 14 '17 at 16:30

score 1 · Answer 1 · answered Feb 13 '17 at 21:48

If you're using LINQ to Entities then you should not worry if ToArray is slower than ToList. There is almost no difference between them in terms of performance and LINQ to Entities itself will be a bottleneck anyway.

Regarding a dictionary. It is a structure optimized for reads by keys. There is an additional cost on adding new items though. So, if you will read by key a lot and add new items not that often then that's the way to go. But to be honest - you probably should not bother at all. If data size is not big enough, you won't see a difference.

Peter Smith · Answer 2 · 2017-02-14T14:44:20.917

Think of IEnumerable, ICollection and IList/IDictionary as a hierarchy each one inheriting from the previous one. Arrays add a level of restriction and complexity on top of Lists. Simply, IEnumerable gives you iteration only. ICollection adds counting and IList then gives richer functionality including find, add and remove elements by index or via lambda expressions. Dictionaries provide efficient access via a key. Arrays are much more static.

So, the answer then depends on your requirements. If it is appropriate to hold the data in memory and you need to frequently re-query it then I usually convert the Entity result to a List. This also loads the data.

If access via a set of keys is paramount then I use a Dictionary.

I cannot remember that last time I used an array except for infrequent and very specific purposes.

SO, not a direct answer, but as your question and the other replies indicate there isn't a single answer and the solution will be a compromise.

score 0 · Answer 3 · edited Jul 10 '22 at 14:44

When I code and measure performance and data carried over the network, here is how I look at things based on your example above.

Let's say your result returns 100 records. Your code has now run a query on the server and performed 1 second of processing (I made the number up for sake of argument).

Then you need to cast it to a list which is going to be 1 more second of processing. Then you want to find all records that have a value of 1. The code will now Loop through the entire list to find the values with 1 and then return you the result. This is let's say another 1 second of processing and it finds 10 records.

Your network is going to carry over 10 records that took 3 seconds to process.

If you move your logic to your Data layer and make your query search right away for the records that you want, you can then save 2 seconds of performance and still only carry 10 records across the network. The bonus side is also that you can just use IEnumerable<T> as a result and not have to cast it a list. Thus eliminating the 1 second of casting to list and 1 second of iterating through the list.

I hope this helps answer your question.

Most efficient collection for storing data from LINQ to Entities?

3 Answers3