2

I would like something that effectively performs the same as TakeWhile but returns two sequences:

  1. The results of TakeWhile
  2. The rest of the input sequence with 1. removed

I know I could do something like:

var a = input.TakeWhile(...);
var b = input.Skip(a.Count);

But this seems potentially non-optimal depending on the container type. Have I missed some neat way to do this in a single operation?

My end goal is to iterate over a large collection rather than pre-bucket-ing it:

while(data.Count() > 0)
{
    var y = data.First().Year;
    var year = data.TakeWhile(c => c.Year == y);
    data = data.Skip(year.Count());

    Console.WriteLine($"{year.Count()} items in {y}");
}
Gilad Green
  • 36,708
  • 7
  • 61
  • 95
Mr. Boy
  • 60,845
  • 93
  • 320
  • 589
  • There are plenty of questions how to bucket/split sequence like https://stackoverflow.com/questions/58146898/split-an-ienumerable-into-multiple-ienumerables. You need to [edit] the question to clarify what exactly you want as result since it is impossible to do "single iteration of sequence" and have two (or more) separate pointers into middle of the sequence to iterate from freely available. – Alexei Levenkov Sep 17 '21 at 19:13
  • How do you envision the two separate returns occurring? – NetMage Sep 17 '21 at 19:13
  • @AlexeiLevenkov I want to do fairly literally what I've said. But I've added a use case. Thanks for the edit link, I'd never have known how to otherwise ;) – Mr. Boy Sep 17 '21 at 19:20
  • I still don't get what exactly you are trying to avoid/optimize (as sample shown in the post does iterate sequence some crazy number of times)... Hopefully others can see the goal with the update (or probably even before that). – Alexei Levenkov Sep 17 '21 at 19:32
  • Usually with Linq, we have _deferred execution_ where you can begin processing the first elements of the source without waiting for the source iteration to complete. We can write a method that takes in the source `input` and the delegate (`Func<,>`) `predicate` and returns the two `IEnumerable<>` which you call `a` and `b`. But we cannot know anything about `b` before `a` is complete. How can this be made with Linq-like deferred execution? The use case you provided for @AlexeiLevenkov is not clear to me. – Jeppe Stig Nielsen Sep 17 '21 at 19:48
  • @GertArnold I think you confused me with Enigmativity – NetMage Sep 20 '21 at 20:50
  • Your use case doesn't show how you expect the new method to work, rather how you don't want it to work. How do you expect it to work? – NetMage Sep 20 '21 at 20:52
  • As you may have noticed, it's still not clear what you want. Do you always want to count/collect all records that TakeWhile returns? But what if a condition is entered that matches all records? You'll end up iterating over the entire collection after all. – Gert Arnold Sep 23 '21 at 19:53

4 Answers4

3

You could use ToLookup to split the source into two results.

var source = new[] { 1, 3, 5, 2, 4, 6, 7, 8, 9 };
Func<int, bool> criteria = x => x % 2 == 1;
bool stillGood = true;
Func<int, bool> takeWhileCriteria = x =>
  stillGood = stillGood && criteria(x);

var result = source.ToLookup(takeWhileCriteria);
var matches = result[true];
var nonMatches = result[false];
Amy B
  • 108,202
  • 21
  • 135
  • 185
  • Does this guarantee to retain the ordering? I've rarely used Lookup – Mr. Boy Sep 17 '21 at 20:25
  • Although I've listed ToLookup as not retaining order https://stackoverflow.com/questions/204505/preserving-order-with-linq/204777#204777 that was about the order of the result groups. Here we know the true group contains the first items (if any) and the false group contains the rest of the items. The items in these groups retain the order of the source. – Amy B Sep 17 '21 at 20:32
  • 1
    @Mr.Boy Surely, the order is guaranteed. [The values within each group are in the same order as in `source`](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.tolookup). But what was meant by "rather than pre-bucket-ing" in the edit of your question? `ToLookup` does what you may call pre-bucket-ing. – Jeppe Stig Nielsen Sep 17 '21 at 20:34
  • Thanks @JeppeStigNielsen I hadn't seen that in the docs – Mr. Boy Sep 17 '21 at 20:38
  • This is just `GroupBy(x=> x %2)`, just written more explicitly... I don't think it fits "rather than pre-bucket-ing" requirement of the update question so... – Alexei Levenkov Sep 17 '21 at 21:56
  • 1
    @AlexeiLevenkov No, that would put `7` in the wrong bucket. But my objection would be that it uses external state inside a LINQ statement, which defeats LINQ's functional programming model. And it still requires multiple passes to get information from both sequences. – Gert Arnold Sep 19 '21 at 09:13
1

The simplest way to split the sequence in one iteration, and streaming, is to return a tuple of each item and a bool whether it's "in" or not.

public static IEnumerable<(T Entity, bool IsIn)> MarkWhile<T>(this IEnumerable<T> sequence, 
    Func<T,bool> predicate)
{
    var isIn = true;
    using var etor = sequence.GetEnumerator();
    while (etor.MoveNext())
    {
        var current = etor.Current;
        isIn &= predicate(current);
        yield return (current, isIn);
    }
}

This allows you iterate over a large collection without exhausting it and to determine when the condition "flips". But you'll need a foreach loop to do this in one pass.

It would be possible to create a method that only exhausts the "in" part of the sequence and even return its count (we can do anything when returning tuples) and stream the tail of the sequence, but I would settle with a simple foreach. Nothing wrong with that. Also, there may be cases where all items meet the condition while you still only want to return a limited number of items.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
1

You can create something like what you want, but only in very limited circumstances:

public static class IEnumerableExt {
    public static IEnumerable<T> ToIEnumerable<T>(this IEnumerator<T> e) {
        while (e.MoveNext())
            yield return e.Current;
    }

    public static (IEnumerable<T> first, IEnumerable<T> rest) FirstRest<T>(this IEnumerable<T> src, Func<T,bool> InFirstFn) {
        var e = src.GetEnumerator();
        var first = new List<T>();
        while (e.MoveNext() && InFirstFn(e.Current))
            first.Add(e.Current);

        return (first, e.ToIEnumerable());
    }
}

Note that this has to iterate over and buffer first before it can return (what if you tried to enumerate rest before first?) and you can't call Reset on rest and expect anything reasonable. Fixing these issues would involve a lot more code.

I can dimly see in the distance some type of extended LINQ where you pass Actions and Funcs and do something like continuations (the rest of the IEnumerable) to process, but I am not sure it is worth it. Something like:

public static IEnumerable<T> DoWhile<T>(this IEnumerable<T> src, Func<T,bool> whileFn, Action<T> doFn) {
        var e = src.GetEnumerator();
        while (e.MoveNext() && whileFn(e.Current))
            doFn(e.Current);
            
        return e.ToIEnumerable();
    }

while you could use like:

while (data.Any()) {
    var y = data.First().Year;

    var ct = 0;
    data = data.DoWhile(d => d.Year == y, d => ++ct);
    
    Console.WriteLine($"{ct} items in {y}");
}

The best answer is to stop using the IEnumerable<T> automatic enumeration and manually enumerate:

for (var e = data.GetEnumerator(); e.MoveNext();) {
    var y = e.Current.Year;

    var ct = 0;
    while (e.Current.Year == y)
        ++ct;

    Console.WriteLine($"{ct} items in {y}");
}

Once you are doing manual enumeration, you can handle most any circumstance without losing efficiency to buffering, or delegate calls for your specific needs.

PS: Note that testing data.Count() against 0 is very inefficient, you should always be using data.Any(). Depending on data, data.Count() may never return, or may be very expensive however even data.Any() may lose data.First().

PPS: A more efficient version of ToIEnumerable would return a custom class that just returns the IEnumerator to GetEnumerator but would have all the caveats and possibly more. The sample ToEnumerable creates daisy chains of while loops.

NetMage
  • 26,163
  • 3
  • 34
  • 55
  • Buffering the head of the sequence might be a problem, it may turn out to be the entire sequence. – Gert Arnold Sep 22 '21 at 17:29
  • @GertArnold Why would that be a problem? – NetMage Sep 23 '21 at 19:12
  • Well, it may swallow up the entire sequence if the `InFirstFn` condition is never met. I think OP clearly wants to avoid that. – Gert Arnold Sep 23 '21 at 19:15
  • @GertArnold Actually it is just the opposite - it will swallow the entire stream if the `InFirstFn` is met by every item. I think it is unclear what the OP wants exactly, but you can't avoid buffering if you want to be able to re-process the first chunk like the sample code does. – NetMage Sep 23 '21 at 19:38
  • Yeah, I meant when the condition is always met. But I don't think they even always want to get the first chunk. It seems more important to receive a limited number of items. But I agree, OP really needs to give some clarification. – Gert Arnold Sep 23 '21 at 19:41
0

You can use Except extension of LINQ, as the linq operators always create new collections leaving the first untouched:

var list = Enumerable.Range(1,10);
var lowerFive = list.TakeWhile(x => x < 5); // 1,2,3,4
var rest = list.Except(lowerFive); // 5,6,7,8,9,10

You could write a generic extension on IEnumerable which returns a tuple with the two lists in it:

public static class Extensions
{
    public static (IEnumerable<T> takeWhilePart, IEnumerable<T> rest) TakeWhileAndTheRest<T>(this IEnumerable<T> origin, Func<T, bool> predicate)
    {
        var takeWhile = origin.TakeWhile(predicate);
        var rest = origin.Except(takeWhile);
        return (takeWhile, rest);
    }
}

And use it like:

void Main()
{
    var list = Enumerable.Range(1,10);

    var collectionTuple = list.TakeWhileAndTheRest(x => x < 5);
    
    collectionTuple.takeWhilePart.Dump(); // 1,2,3,4
    collectionTuple.rest.Dump(); // 5,6,7,8,9,10
}
Javi Kroonenburg
  • 1,050
  • 8
  • 11