13

I am only interested to know whether a HashSet hs is empty or not. I am NOT interested to know exactly how many elements it contains.

So I could use this:

bool isEmpty = (hs.Count == 0);

...or this:

bool isEmpty = hs.Any(x=>true);

Which one provides better results, performance-wise(specially when the HashSet contains a large number of elements) ?

Andy
  • 3,631
  • 2
  • 23
  • 32
  • 1
    Take a look at this http://stackoverflow.com/questions/305092/which-method-performs-better-any-vs-count-0 – Alessandro D'Andria Aug 14 '13 at 15:17
  • The C# hashset maintains a count so that's more efficient than finding out if there is a single element. However if you want it to be conditional Any will always be better, because it will stop at the first match. – MrFox Aug 14 '13 at 15:25
  • 2
    You can just use `Any()` rather than supplying a lambda. – Servy Aug 14 '13 at 15:26
  • 1
    This is micro-optimization questions distract you from your real goal: writing better code :-) Use whatever you think is cleaner. You can also download a .NET decompiler (such as [ILSpy](http://ilspy.net/)) and check both implementations. `Enumerable.Any` will probably be a tiny bit slower, since it needs to instantiate an enumerator. BTW you probably want something like `bool isEmpty = !hs.Any();` – Paolo Moretti Aug 16 '13 at 18:04

3 Answers3

19

On a HashSet you can use both, since HashSet internally manages the count.

However, if your data is in an IEnumerable<T> or IQueryable<T> object, using result.Any() is preferable over result.Count() (Both Linq Methods).

Linq's .Count() will iterate through the whole Enumerable, .Any() will only peek if any objects exists within the Enumerable or not.

Update: Just small addition: In your case with the HashSet .Count may be preferable as .Any() would require an IEmumerator to be created and returned which is a small overhead if you are not going to use the Enumerator anywhere in your code (foreach, Linq, etc.). But I think that would be considered "Micro optimization".

Tseng
  • 61,549
  • 15
  • 193
  • 205
3

HastSet<T> implements ICollection<T>, which has a Count property, so a call to Count() will just call HastSet<T>.Count, which I'm assuming is an O(1) operation (meaning it doesn't actually have to count - it just returns the current size of the HashSet).

Any will iterate until it finds an item that matches the condition, then stop.

So in your case, it will just iterate one item, then stop, so the difference will probably be negligible.

If you had a filter that you wanted to apply (e.g. x => x.IsValid) then Any would definitely be faster since Count(x => x.IsValid) would iterate over the entire collection, while Any would stop as soon as if finds a match.

For those reasons I generally prefer to use Any() rather than Count()==0 since it's more direct and avoids any potential performance problems. I would only switch to Count()==0 if it provided a significant performance boost over Any().

Note that Any(x=>true) is logically the same as calling Any(). That doesn't change your question, but it looks cleaner without the lambda.

D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • 2
    No. `.Any()` parameterless won't iterate until it stops. `.Any(x => x)` will because it has to evaluate the contents. All the parameterless has to do is reset the enumerator and call `enumerator.GetNext()` and return it's result to find out if there are any results. `.Any(x => x)` will have to check the actually contents – Tseng Aug 14 '13 at 15:21
  • 1
    @Tseng `.Any(x=>true)` will evaluate the first item, find that it resolves to true, and then evaluate no more. It's a negligible amount more work than just `.Any()`. The primary advantage of using the parameterless overload is just that it's faster to type and (arguably) a bit clearer to the reader. – Servy Aug 14 '13 at 15:28
  • You're right. I've overseen the fact that the it returns a constant. – Tseng Aug 14 '13 at 15:37
0

Depending on the type of collection, it may or may not matter performance-wise. So why not just use hs.Any() since that is designed for exactly what you need to know?

And the lambda expression x => true has no meaning here. You can leave that out.

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466