Best-practice for data-object properties: IEnumerable vs Array

Question

Short question: is it OK to declare data-object properties as IEnumerable, or it should be Array instead?

Background:

I've just found a bug in our project which caused performance issue. The reason was that IEnumerable has been iterated multiple times. But it is simple only at first sight. I think there is a design flaw there which allowed that to happen.

The deeper investigation has shown that one method GetAllUsers returned a UsersResponse object, one of the properties of which was IEnumerable<T> UsersList. When caching was implemented, obviously the entire UsersResponse object was being cached, and it worked fine at that time because GetAllUsers assigned an array to IEnumerable<T> UsersList. Later implementation of GetAllUsers has changed and for some reason developer decided that ToArray() call is redundant. So I think the problem was that UsersResponse object was not well-designed and allowed too much freedom for it's factory-method. On the other hand, caching an object which contains IEnumerable properties is also useless in principle.

So we return to my question of designing data-objects in general: when you declare it, not knowing if it will be cached some time in the future or how it will be used in other ways except your current need, is it OK to declare it's properties as IEnumerable, placing the responsibility of careful usage on other developers, or it must be Array from the start?

What I've searched:

The only suggestion I've found is Jon Wagner's blog post where he recommends to “Seal” LINQ chains as soon as they have been built. But this relates more to building the IEnumerable than storing it in an entity property. Although in conjunction with principle to return as specific type as possible, it can imply declaring property as Array.

The answer is: it depends. In one situation you would want to use forward-only read-only collection, in the other - arrays. There is no best approach that fits all. — oleksii, Apr 09 '15 at 10:12
@oleksii, could you please expand a bit: in which scenario would you use forward-only collection in data-entity? I also thought the answer should be "it depends", but couldn't find any case where it is reasonable. — Sasha, Apr 09 '15 at 10:20
I remember there were a few extensive debates on this topic. Pls see [What's the difference between IEnumerable and Array, IList and List?](http://stackoverflow.com/q/764748/706456t) and [Should I favour IEnumerable or Arrays?](http://stackoverflow.com/q/5156561/706456) and [IEnumerable vs T](http://stackoverflow.com/q/3433284/706456) — oleksii, Apr 09 '15 at 10:33
@oleksii, sorry, but neither of those answers to my question. I already saw some of those threads before starting a new one. They all are too generalized and the answer for them is obviously "it depends". My question targets specifically data-objects. Like the ones returned from DB repositories, or web services, or just internal buiseness entities to pass data from one class to another. The core here is that data-object in principle should represent a data and not the actions how to retrieve it. The same question would be true for Lazy, and for delegates in data entities. — Sasha, Apr 09 '15 at 10:46
I cannot see how caching is relevant to this question. Is there any change you meant to `IQueryable` instead of `IEnumerable`? Both `IEnumerable` and `array` are going to be delivered to the client's memory if you use some sort of remote API. It just depends what you want to do with it. For example, if you want to use LINQ with the collection - IEnumerable might be easier to query. If you want to have `O(1)` access to any element in the collection - an array seems to be a better choice. To me the difference is only in how you intend to use it. — oleksii, Apr 09 '15 at 11:08
@oleksii, caching is only an example of problems caused by using IEnumerable. When you design a data-entity you usually don't know how it will be used in every scenario in the furure and that is the issue - you have to decide IEnumerable or Array based on assumptions about usage. One more point in favour of Array is that it can be changed to IEnumerable in the future without breaking backward compatibility (just changing the behavior). Under Array I mean T[], so LINQ works just fine on it. — Sasha, Apr 09 '15 at 11:32

score 3 · Answer 1 · answered Apr 09 '15 at 10:25

3

When I'm thinking about API design I am always trying to be "nice" to the consumer. This means accepting as much as possible for parameters (when possible), and providing as much as possible for return values. If you too buy into this, it means that you should strive for IEnumerable parameters while providing Array (or similar) return values. The result is maximum value for the consumer of your API (even if it is yourself in the end).

answered Apr 09 '15 at 10:25

Daniel Lidström

9,930
1
27
35

The data entity in my question can be used both as return parameter from one set of methods and as an argument of others. What about this case? – Sasha Apr 09 '15 at 10:32
I agree, take IEnumerable in the arguments, but always return a List or T[] as the return type. The only annoying thing about this is that you can't use `yield return;` :( – demoncodemonkey Apr 09 '15 at 10:39
Returning `T[]` or `List` exposes your implementation and is a big nono. – bstenzel Apr 09 '15 at 11:02
@demoncodemonkey Sometimes I too want to use `yield return`. Then I simply make an exception :-) – Daniel Lidström Apr 09 '15 at 11:27
2

@OleksandrPshenychnyy It appears to me that `Array` is more appropriate in your case, since you need to cache things. Also, talk through class design with your team. Imposing `IEnumerable` on all consumers is a severe limitation and totally unnecessary, unless the data actually _needs_ to be streamed. – Daniel Lidström Apr 09 '15 at 11:33
@bstenzel Returning `IEnumerable` forces your consumers to stream the data (_just once_). That's the only guarantee you get from `IEnumerable`. That is quite a limitation, no? – Daniel Lidström Apr 09 '15 at 11:37
I somewhat agree with @bstenzel, but if not IEnumerable is used. IList for example will allow you vary implementation and defines clear usage interface for the caller. That is if you need to return mutable collection. In case of immutable (array), I feel a lack of interface for that (ReadOnlyCollection) for some reason violates Liskov Substitution Principle and array doesn't implement it, although it is logically a readonly collection. – Sasha Apr 09 '15 at 11:44
Using `IEnumerable` doesn't imply streaming. You just don't know whether or not the data is streamed. That wasn't my point though. Using `IEnumerable` or concrete arrays are both extremes. If one extreme doesn't fit your needs, you don't need to swing to the other one. – bstenzel Apr 09 '15 at 12:00

score 0 · Answer 2 · answered Apr 09 '15 at 10:20

I generally favour the Array approach unless it is forced into collections. Especially when there's a lot numerical computations.

Advantages:

Compact storage. (probability for hacky techniques)
[ ] accessor. ( not applicable for C# )
readable code for multi dimensional arrays

last but not least, stick to one and never use both. it is terrible to convert a collection from Array to Collection and vice versa, this does no good either.

score 0 · Answer 3 · answered Apr 09 '15 at 11:01

The most abstract type that fulfills the requirements should be used. If you need to make it impossible to store an "open" query in a DTO, then use ICollection<T> or (if you need indexed access) IList<T> for the property.

Using an array will nail you down to a concrete implementation. This may be a non-issue or it may turn out to be a pain at some later point. The latter rules out array IMHO.

BTW: It is the caller's responsibility to only iterate once over an IEnumerable<T>.

Best-practice for data-object properties: IEnumerable vs Array

3 Answers3