23

I need to check if a sequence has any items satisfying some condition but at the same time NOT all items satisfying the same condition.

For example, for a sequence of 10 items I want to have TRUE if the sequence has at least one that satisfy the condition but not all:

  • 10 items satisfying, 0 items not, result is FALSE
  • 0 items satisfying, 10 items not, result is FALSE
  • 1 item satisfying, 9 items not, result is TRUE
  • 9 items satisfying, 1 item not, result is TRUE

I know I could to this:

mySequence.Any (item => item.SomeStatus == SomeConst) && !mySequence.All (item => item.SomeStatus == SomeConst)

But this is not optimal.

Is there a better way?

Stécy
  • 11,951
  • 16
  • 64
  • 89
  • 1
    You could probably use `.Aggregate` with a custom type that holds properties for the needed information - just an input ... –  May 01 '15 at 19:41
  • 1
    @AndreasNiedermair That'd work, but I don't think you can abort an aggregate as soon as you've found both options so you'd complete the iteration even if you didn't need to. – Rup May 01 '15 at 19:42
  • @Rup absolutely true. I wasn't aware that the OP is after a short-curcuit variant ... –  May 01 '15 at 19:43
  • If your sequence is a collection that has a `Count` or `Length` property, it might be cheaper to get `.Count(item => item.SomeStatus == SomeConst)`. If the value is greater than 0 and less than the collection size, return true. If the sequence is an `IEnumerable`, your implementation is probably more optimal due to short circuiting. – Dmitry S. May 01 '15 at 19:45

8 Answers8

30

You'll like this.

var anyButNotAll = mySequence
    .Select(item => item.SomeStatus == SomeConst)
    .Distinct()
    .Take(2)
    .Count() == 2;

The Take(2) stops it iterating over any more elements than it has to.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • 6
    Note that you'd almost certainly want to throw this in a method, such as `SomeButNotAll`, just for the sake of clarity, because this code, as it is, doesn't do a great job of self-documenting. – Servy May 01 '15 at 19:44
  • this will iterate over the whole collection at least once to do the `.Select` and then the `.Distinct`... –  May 01 '15 at 19:44
  • 4
    @AndreasNiedermair No, it won't. – Rawling May 01 '15 at 19:44
  • That is VERY nice. I did the same thing but without the `Take(2)` and deleted it when I realised my blunder. +1 – Wai Ha Lee May 01 '15 at 19:45
  • 1
    @Rawling ah thanks, thought that `.Distinct` needs some overhead. my initial comment should have been more like a question though ;) –  May 01 '15 at 19:45
  • @AndreasNiedermair It uses a little overhead in building up a `HashSet`, but since it builds it up as it goes it can still exhibit delayed execution :) – Rawling May 01 '15 at 19:46
  • @AndreasNiedermair Both of the methods you listed only evaluate as many items from the source sequence as they need to in order to compute the next item in the sequence; they don't need to evaluate the whole sequence to compute the next item. – Servy May 01 '15 at 19:46
  • @AndreasNiedermair It needs to remember all items previously yielded, meaning it's memory footprint is O(m) (where m is the number of items yielded so far) unlike methods such as `Select` or `Where` that have a memory footprint of O(1). – Servy May 01 '15 at 19:47
  • 1
    @Servy for `.Select` this was absolutely clear to me, but I haven't investigated the source for `.Distinct` yet ... so, for clarification: http://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,4ab583c7d8e84d6d –  May 01 '15 at 19:48
  • would doing a `GroupBy(x => x.SomeStatus == SomeConst).Take(2).Count() == 2` would be more overhead than `Select` and `Distinct`? – DLeh May 01 '15 at 19:48
  • @DLeh It probably wouldn't shortcut the way this will, as it'd need to complete the iteration to complete the groups. – Rup May 01 '15 at 19:49
  • @Rup that's what I was thinking – DLeh May 01 '15 at 19:49
  • 1
    [Here](http://ideone.com/4j3hcq) is a link to an ideone page where you can see for yourself that this only evaluates as many as it needs. – Wai Ha Lee May 01 '15 at 19:51
  • @WaiHaLee Heh, I'd just done exactly that: http://ideone.com/LifIR2 because I didn't know Distinct would behave like that. But it's also obvious from the source Andreas linked to. – Rup May 01 '15 at 19:51
  • 1
    @Rup Why would `Distinct` need to iterate the entire collection to yield, say, the first item? Even ignoring the source code, just thinking logically about it is generally sufficient. – Servy May 01 '15 at 19:54
  • 2
    @Servy thinking logically could imply some naive thinking too (which I did ...) ... so, stick to the source - that's the only truth :) –  May 01 '15 at 19:55
  • The .Take(2) seems unnecessary to me since the .Distinct() can only return two values for a boolean type. – Maarten May 01 '15 at 20:00
  • 3
    @Maarten `Enumerable.Count()` requires a full evaluation, whereas `Enumerable.Take()` shortcuts the execution. –  May 01 '15 at 20:01
  • 2
    @Maarten The `Take` doesn't change the result, but it does prevent the query from asking `Distinct` for a third item, which would require it to completely evaluate the rest of the sequence to determine that there isn't one. – Servy May 01 '15 at 20:02
  • Genius answer! I love it. :-) – Kim Homann Jan 28 '21 at 10:59
6

If your concern is iterating through all of the elements in a large collection, you're okay - Any and All will short-circuit as soon as possible.

The statement

mySequence.Any (item => item.SomeStatus == SomeConst)

will return true as soon as one element that meets the condition is found, and

!mySequence.All (item => item.SomeStatus == SomeConst)

will return true as soon as one element does not.

Since both conditions are mutually exclusive, one of the statements is guaranteed to return after the first element, and the other is guaranteed to return as soon as the first element is found.


As pointed out by others, this solution requires starting to iterate through the collection twice. If obtaining the collection is expensive (such as in database access) or iterating over the collection does not produce the same result every time, this is not a suitable solution.

BJ Myers
  • 6,617
  • 6
  • 34
  • 50
  • But what if the source sequence doesn't return the same items if iterated multiple times, or does expensive computation to compute it's items? – Servy May 01 '15 at 19:40
  • So it's as good as it can possibly be, except for all of the extremely common cases where it isn't... Acknowledging that your answer is wrong in your answer doesn't make it not wrong. – Servy May 01 '15 at 19:52
  • @Servy No need to be snarky. LINQ is frequently used for static, in-memory collections where the above problems do not apply. Limited applicability of a solution does not make it wrong. – BJ Myers May 01 '15 at 19:54
  • And it's frequently used for other types of sequences where it *doesn't* apply. So saying that a particular solution is "as good as it can get" when there are frequent situations in which is performs *dramatically* worse than alternatives, it is not in fact "as good as it can get". If you want a solution that's "as good as you can get, at least in certain circumstances" then just have the implementation "return true;" it's by *far* the best implementation in all of the circumstances where there are items that do and don't meet the condition. – Servy May 01 '15 at 19:56
  • 1
    `Limited applicability of a solution does not make it wrong.` That would be true if the OP were just looking for any solution to the problem, and wasn't concerned about performance. But when you say, "it's as good as it can get" when it's not as good as it can get, then *it is in fact wrong*. It would be correct to say that it is a solution, it's *incorrect* when you claimed that it was an optimal solution. – Servy May 01 '15 at 19:58
  • 1
    The question is asking, "is there a better way". Your answer is "no, what you have is fine". That answer is incorrect, there are better ways, ways that are more efficient and don't have the rather noticeable drawbacks of this answer in both performance and correctness. – Servy May 01 '15 at 20:29
3

That is likely a fairly optimal solution if the source is a database. This extension method might be better depending on your source (I think, I just threw it together -- likely many many errors, consider it more pseudo code). The benefit here is it only enumerates once and does a short-circuit as soon as it has read enough to determine the outcome:

static bool SomeButNotAll<TSource>(this IEnumerable<TSource> source,
                                   Func<TSource, bool> predicate)
{
   using(var iter=source.GetEnumerator())
   {
     if (iter.MoveNext())
     {
       bool initialValue=predicate(iter.Current);
       while (iter.MoveNext())
         if (predicate(iter.Current)!=initialValue)
           return true;
     }
   }     
   return false; /* All */
}
Robert McKee
  • 21,305
  • 1
  • 43
  • 57
  • You'll need to accept a `Func` not an `Expression`, since you plan on invoking it, and since you only plan to iterate `source` and do nothing else, you should accept an `IEnumerable` and not an `IQueryable`. – Servy May 01 '15 at 20:45
  • 1
    Dispose of your enumerators. – Jon Hanna May 02 '15 at 11:41
3

You could define your own extension method. This version is more verbose, but still readable, and it only enumerates the IEnumerable once:

bool AnyButNotAll<ItemT>(this IEnumerable<ItemT> sequence, Func<ItemT, bool> predicate)
{
    bool seenTrue = false;
    bool seenFalse = false;

    foreach (ItemT item in sequence)
    {
        bool predResult = predicate(item);
        if (predResult)
            seenTrue = true;
        if (!predResult)
            seenFalse = true;

        if (seenTrue && seenFalse)
            return true;
    }

    return false;
}

Much shorter, but enumerates the IEnumerable twice:

bool AnyButNotAll<ItemT>(this IEnumerable<ItemT> sequence, Func<ItemT, bool> predicate)
{
    return sequence.Any(predicate) && !sequence.All(predicate);
}
Tanner Swett
  • 3,241
  • 1
  • 26
  • 32
2

You could try this:

var result = mySequence.Select(item => item.SomeStatus == SomeConst)
                      .Distinct().Count() > 1 ? false : true;

Basically I select true or false for each value, get distinct to only get one of each, then count those.

Jacob Lambert
  • 7,449
  • 8
  • 27
  • 47
  • 1
    Note this prevents short circuiting in the event that the first item is true and the second false. – Servy May 01 '15 at 19:44
1

You could put your predicate in a variable so that you don't have to repeat the predicate twice:

Func<MyItemType, bool> myPredicate = item => item.SomeStatus == SomeConst;
if (mySequence.Any(myPredicate) && !mySequence.All(myPredicate))
    ...
Tanner Swett
  • 3,241
  • 1
  • 26
  • 32
  • This change is purely stylistic, and has no effect on how it performs at all, which is what the question is asking about. – Servy May 01 '15 at 20:43
  • @Servy I wasn't aware that the asker wasn't interested in purely stylistic suggestions. – Tanner Swett May 01 '15 at 20:50
1

If you wanted to define this as a method, you could take then approach Linq takes in defining both IEnumerable<T> and IQueryable<T> extension methods. This allows for the optimal approach to be taken automatically:

public static bool SomeButNotAll<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate)
{
  if(source == null)
    throw new ArgumentNullException("source");
  if(predicate == null)
    throw new ArgumentNullException("predicate");
  return source.
    Select(predicate)
    .Distinct()
    .Take(2)
    .Count() == 2;
}
public static bool SomeButNotAll<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
  if(source == null)
    throw new ArgumentNullException("source");
  if(predicate == null)
    throw new ArgumentNullException("predicate");
  using(var en = source.GetEnumerator())
    if(en.MoveNext())
    {
      bool first = predicate(en.Current);
      while(en.MoveNext())
        if(predicate(en.Current) != first)
          return true;
    }
  return false;
}

If you're using EntityFramework (or another provider that provides a CountAsync you can also provide an asynchronous version easily:

public static async Task<bool> SomeButNotAllAsync<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate, CancellationToken cancel)
{
  if(source == null)
    throw new ArgumentNullException("source");
  if(predicate == null)
    throw new ArgumentNullException("predicate");
  cancel.ThrowIfCancellationRequested();
  return await source.
    Select(predicate)
    .Distinct()
    .Take(2)
    .CountAsync(cancel)
    .ConfigureAwait(false) == 2;
}
public static Task<bool> SomeButNotAllAsync<T>(this IQueryable<T> source, Expression<Func<T, bool>> predicate)
{
  return source.SomeButNotAllAsync(predicate, CancellationToken.None);
}
Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
0

You can use the Aggregate method to do both things at once. I would suggest to use an anonymous type for the TAccumulate, containing both counters. After the aggregation you can read both values from the resulting anonymous type.

(I can't type an example, I'm on my phone)

See the documentation here: https://msdn.microsoft.com/en-us/library/vstudio/bb549218(v=vs.100).aspx

Maarten
  • 22,527
  • 3
  • 47
  • 68