Buffering a LINQ query

Question

FINAL EDIT:

I've chosen Timothy's answer but if you want a cuter implementation that leverages the C# yield statement check Eamon's answer: https://stackoverflow.com/a/19825659/145757

By default LINQ queries are lazily streamed.

ToArray/ToList give full buffering but first they're eager and secondly it may take quite some time to complete with an infinite sequence.

Is there any way to have a combination of both behaviors : streaming and buffering values on the fly as they are generated, so that the next querying won't trigger the generation of the elements that have already been queried.

Here is a basic use-case:

static IEnumerable<int> Numbers
{
    get
    {
        int i = -1;

        while (true)
        {
            Console.WriteLine("Generating {0}.", i + 1);
            yield return ++i;
        }
    }
}

static void Main(string[] args)
{
    IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0);

    foreach (int n in evenNumbers)
    {
        Console.WriteLine("Reading {0}.", n);
        if (n == 10) break;
    }

    Console.WriteLine("==========");

    foreach (int n in evenNumbers)
    {
        Console.WriteLine("Reading {0}.", n);
        if (n == 10) break;
    }
}

Here is the output:

Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.
==========
Generating 0.
Reading 0.
Generating 1.
Generating 2.
Reading 2.
Generating 3.
Generating 4.
Reading 4.
Generating 5.
Generating 6.
Reading 6.
Generating 7.
Generating 8.
Reading 8.
Generating 9.
Generating 10.
Reading 10.

The generation code is triggered 22 times.

I'd like it to be triggered 11 times, the first time the enumerable is iterated.

Then the second iteration would benefit from the already generated values.

It would be something like:

IEnumerable<int> evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer();

For those familiar with Rx it's a behavior similar to a ReplaySubject.

It's not really the the LINQ that needs caching but the `IEnumerable`, and there are a few examples of that [already on the internet](http://wilsonhut.wordpress.com/2012/03/15/obvious-extension-methods-ienumerable-cache/). — Scott Chamberlain, Nov 06 '13 at 23:29
This was on reddit yesterday ([here](http://twistedoakstudios.com/blog/Post7694_achieving-exponential-slowdown-by-enumerating-twice)) with this exact scenario. I'd rather not steal that author's solution. — Austin Salonen, Nov 06 '13 at 23:30
@ScottChamberlain: thanks for the link, Google was not my friend on this one. — Pragmateek, Nov 06 '13 at 23:38
@AustinSalonen: crazy coincidence and thanks for the link. :) — Pragmateek, Nov 06 '13 at 23:38
The general term for this is "memoization". Note that many of the implementations here handle some of the simple cases, but don't handle multiple enumerators enumerating the result before one has finished completely, don't handle parallelized enumeration of different enumerators, don't dispose of the underlying enumerable if the whole sequence isn't iterated, etc. To handle these more complex issues you're best off using an existing library implementation. — Servy, Nov 08 '13 at 15:49
@Servy: thanks for the info. Didn't know that the term memoization was applicable here. — Pragmateek, Nov 08 '13 at 19:42
I think the correct term would be "caching", not "buffering". — jaraics, Jan 19 '15 at 18:32

Timothy Shields · Accepted Answer · 2013-11-12T02:11:28.573

16

`IEnumerable<T>.Buffer()` extension method

public static EnumerableExtensions
{
    public static BufferEnumerable<T> Buffer(this IEnumerable<T> source)
    {
        return new BufferEnumerable<T>(source);
    }
}

public class BufferEnumerable<T> : IEnumerable<T>, IDisposable
{
    IEnumerator<T> source;
    List<T> buffer;
    public BufferEnumerable(IEnumerable<T> source)
    {
        this.source = source.GetEnumerator();
        this.buffer = new List<T>();
    }
    public IEnumerator<T> GetEnumerator()
    {
        return new BufferEnumerator<T>(source, buffer);
    }
    public void Dispose()
    {
        source.Dispose()
    }
}

public class BufferEnumerator<T> : IEnumerator<T>
{
    IEnumerator<T> source;
    List<T> buffer;
    int i = -1;
    public BufferEnumerator(IEnumerator<T> source, List<T> buffer)
    {
        this.source = source;
        this.buffer = buffer;
    }
    public T Current
    {
        get { return buffer[i]; }
    }
    public bool MoveNext()
    {
        i++;
        if (i < buffer.Count)
            return true;
        if (!source.MoveNext())
            return false;
        buffer.Add(source.Current);
        return true;
    }
    public void Reset()
    {
        i = -1;
    }
    public void Dispose()
    {
    }
}

Usage

using (var evenNumbers = Numbers.Where(i => i % 2 == 0).Buffer())
{
    ...
}

Comments

The key point here is that the IEnumerable<T> source given as input to the Buffer method only has GetEnumerator called once, regardless of how many times the result of Buffer is enumerated. All enumerators for the result of Buffer share the same source enumerator and internal list.

edited Nov 12 '13 at 02:11

answered Nov 06 '13 at 23:16

Timothy Shields

75,459
18
120
173

2

It immediately evaluates Numbers completely, even before `evenNumbers` is ever used – sinelaw Nov 06 '13 at 23:18
2

Well Timothy as I said on an infinite sequence `ToList` is quite long. ;) – Pragmateek Nov 06 '13 at 23:21
@sinelaw: as you say "completely", even if there is no completion ;) – Pragmateek Nov 06 '13 at 23:22
1

@Pragmateek I missed that point. I figured out what you want and have updated the answer. – Timothy Shields Nov 06 '13 at 23:27
@TimothyShields: thanks for your implementation. I really hoped there was a standard way of doing this but nothing is perfect. You get this one. :) – Pragmateek Nov 06 '13 at 23:43
@lazyberezovsky `BufferEnumerator` is an *enumerator*, so it should definitely be implementing `IEnumerator` as it does, not `IEnumerable`. – Timothy Shields Nov 06 '13 at 23:59
@TimothyShields then why you are returning it from method which has `IEnumerable` return type? – Sergey Berezovskiy Nov 07 '13 at 00:00
@lazyberezovsky Because that was obviously a typo. ;) There's a reason for the "browser code" disclaimer. haha – Timothy Shields Nov 07 '13 at 00:01
@TimothyShields you still missing `Skip(i)` and not all paths from `MoveNext` return value – Sergey Berezovskiy Nov 07 '13 at 00:03
@lazyberezovsky I think you're mistaken about the `Skip(i)`. Where are you suggesting that should be? Also, the `MoveNext` method should have `return true` at the end. – Timothy Shields Nov 07 '13 at 00:06
@TimothyShields yep, looks like you don't need Skip here, now +1 – Sergey Berezovskiy Nov 07 '13 at 00:11
1) This never disposes the underlying enumerator ever 2) This doesn't handle different threads, each with an enumerator from the same enumerable. 3) This doesn't handle the case where one enumerator partially enumerates the sequence and then another moves past it. – Servy Nov 08 '13 at 15:53
@Servy It actually does handle 3, look again. It wasn't designed to handle 2. It would be simple to make it threadsafe though. I skipped 1 because it would require making the BufferEnumerable IDisposable, though in retrospect I guess that's not too bad. I'll edit it. – Timothy Shields Nov 08 '13 at 17:11
Yeah, I was thinking about an improper use case for 3, this does handle it. As for two, I answered a question on the topic a while ago for a threadsafe version of this. It's actually a rather hard problem, if you want to really make a robust implementation. There are a lot of odd ways that they can interact. – Servy Nov 08 '13 at 17:37
@Servy Go ahead and link your version with the threadsafe part. – Timothy Shields Nov 08 '13 at 17:48
[found it](http://stackoverflow.com/questions/12427097/is-there-an-ienumerable-implementation-that-only-iterates-over-its-source-e-g/12427251#12427251) – Servy Nov 08 '13 at 17:51
@Servy I guess you could dispose the enumerator if you've finished enumerating; but otherwise you can't really do much - another iteration could always come along later and ask for that little bit more. Thread-safety comes at a high price here because you'd need lots of small locks (which are very slow compared to the rest of the code); you could even suggest it's not really generally possible since some enumerables may have thread-affinity. – Eamon Nerbonne Nov 11 '13 at 12:43

Eamon Nerbonne · Answer 2 · 2013-11-11T14:14:48.770

You can use the Microsoft.FSharp.Collections.LazyList<> type from the F# power pack (yep, from C# without F# installed - no problem!) for this. It's in Nuget package FSPowerPack.Core.Community.

In particular, you want to call LazyListModule.ofSeq(...) which returns a LazyList<T> that implements IEnumerable<T> and is lazy and cached.

In your case, usage is just a matter of...

var evenNumbers = LazyListModule.ofSeq(Numbers.Where(i => i % 2 == 0));
var cachedEvenNumbers = LazyListModule.ofSeq(evenNumbers);

Though I personally prefer var in all such cases, note that this does mean the compile-time type will be more specific than just IEnumerable<> - not that this is likely to ever be a downside. Another advantage of the F# non-interface types is that they expose some efficient operations you can't do efficienly with plain IEnumerables, such as LazyListModule.skip.

I'm not sure whether LazyList is thread-safe, but I suspect it is.

Another alternative pointed out in the comments below (if you have F# installed) is SeqModule.Cache (namespace Microsoft.FSharp.Collections, it'll be in GACed assembly FSharp.Core.dll) which has the same effective behavior. Like other .NET enumerables, Seq.cache doesn't have a tail (or skip) operator you can efficiently chain.

Thread-safe: unlike other solutions to this question Seq.cache is thread-safe in the sense that you can have multiple enumerators running in parallel (each enumerator is not thread safe).

Performance I did a quick benchmark, and the LazyList enumerable has at least 4 times more overhead than the SeqModule.Cache variant, which has at least three times more overhead than the custom implementation answers. So, while the F# variants work, they're not quite as fast. Note that 3-12 times slower still isn't very slow compared to an enumerable that does (say) I/O or any non-trivial computation, so this probably won't matter most of the time, but it's good to keep in mind.

TL;DR If you need an efficient, thread-safe cached enumerable, just use SeqModule.Cache.

Thanks *Eamon*, F# is full of surprise. :) +1 – Pragmateek Nov 06 '13 at 23:37 — Pragmateek, Nov 06 '13 at 23:37
@Pragmateek Yeah - this is just `Seq.cache` in F# – Reed Copsey Nov 06 '13 at 23:38 — Reed Copsey, Nov 06 '13 at 23:38

score 7 · Answer 3 · edited May 23 '17 at 11:54

7

I hope this answer combines the brevity and clarity of sinelaw's answer and the support for multiple enumerations of Timothy's answer:

public static IEnumerable<T> Cached<T>(this IEnumerable<T> enumerable) {
    return CachedImpl(enumerable.GetEnumerator(), new List<T>());
}

static IEnumerable<T> CachedImpl<T>(IEnumerator<T> source, List<T> buffer) {
    int pos=0;
    while(true) {
        if(pos == buffer.Count) 
            if (source.MoveNext()) 
                buffer.Add(source.Current); 
            else 
                yield break;
        yield return buffer[pos++];
    }
}

Key ideas are to use the yield return syntax to make for a short enumerable implementation, but you still need a state-machine to decide whether you can get the next element from the buffer, or whether you need to check the underlying enumerator.

Limitations: This makes no attempt to be thread-safe, nor does it dispose the underlying enumerator (which, in general, is quite tricky to do as the underlying uncached enumerator must remain undisposed as long as any cached enumerabl might still be used).

edited May 23 '17 at 11:54

Community

1
1

answered Nov 07 '13 at 00:22

Eamon Nerbonne

47,023
20
101
166

Nice. It also passes the Zip test. – sinelaw Nov 07 '13 at 00:27
Yeah. Shame that it needs a pointless wrapper method as you point out, but still nicer that all that manual interface implementation stuff. – Eamon Nerbonne Nov 07 '13 at 00:55
I've added [another solution](http://stackoverflow.com/a/19828954/562906) that's longer but uses a general pattern for simulating anonymous iterators, so a bit more pretty. – sinelaw Nov 07 '13 at 05:57
1

It's generally a good idea to use braces around your `if` when you have a dangling `else`, as you have here. – Servy Nov 08 '13 at 15:57
@Sevvy that's a matter of taste. I contend that with automatic indenting, the purpose of those braces is almost nil - the indent, if you're used to it, makes it immediately obvious which `if` the `else` applies to (and the auto-indent means that making a mistake here is extremely unlikely): I've been programming .NET almost since day 1 and I've never *ever* seen this get confusingly messed up. Additionally: in an online forum like this I find it's helpful to have only the critical long blurbs of code; the extra visual distraction of extra braces isn't ideal. – Eamon Nerbonne Nov 08 '13 at 22:20
However, the test + body on the same line makes it a little less skimmable, perhaps - I've edited the formatting, is this clearer? – Eamon Nerbonne Nov 08 '13 at 22:21
@servy I meant to direct the previous comments at you, but misspelled your username :-) – Eamon Nerbonne Nov 10 '13 at 17:45
@Pragmateek I notice you've accepted the manual `BufferEnumerator` implementation. Is there a particular reason you prefer it? – Eamon Nerbonne Nov 11 '13 at 12:46
@EamonNerbonne: because *Timothy* was first, put a lot of efforts in fixing his first try and his implementation is a good reminder of the use of IEnumerable from scratch. Yours is of course cuter and in production this is the one I'd used. This is the kind of dilemma that happens too often on SO, if I could I would have accepted both your answers. I'll update my question to give your implementation the visibility it deserves. :) – Pragmateek Nov 11 '13 at 12:59
@Pragmateek - I get it; but it's a bit of shame because this isn't visible; and unlike the bufferenumerable, this actually compiles and is easy (I hope) to understand. – Eamon Nerbonne Nov 11 '13 at 13:11
@EamonNerbonne: I've pushed my edit at the top of my question, that should be more visible. And yes yours is the best but I won't change my vote now, sorry. I hope I'll have plenty of chance to reward your skills with other questions I'll ask later. :) – Pragmateek Nov 11 '13 at 13:21
@Pragmateek Yeah, I understand your position. And it's not about the "reward" - it's just a little unfortunate that a future visitor will see a suboptimal implementation first, and probably not read any further. (I'm pretty sure most people don't read later answers). – Eamon Nerbonne Nov 11 '13 at 14:03
@EamonNerbonne That's just it; you've been used to using the language for a long time. Someone who's spent a lot of time using a language that handles the dangling if problem differently (namely uses the outer if rather than the inner) is likely to think that you just messed up your indentation and have an error. I too normally omit braces when they aren't needed, but having a dangling else is an exception I find generally worth making. I agree it's a matter of taste though. – Servy Nov 11 '13 at 15:02
@Servy sure, it depends on the team, and though I prefer omitting braces some like the consistency of always including braces or something in between. As a minor matter of trivia: I don't think there is any language that binds the else to the outer if. It's not actually ambiguous, it's just that the semantics require *context* to resolve, and that means that the simple, efficient CFG parsers can't deal with it without manual help. – Eamon Nerbonne Nov 11 '13 at 20:57

score 7 · Answer 4 · edited May 23 '17 at 12:03

7

Building upon Eamon's answer above, here's another functional solution (no new types) that works also with simultaneous evaluation. This demonstrates that a general pattern (iteration with shared state) underlies this problem.

First we define a very general helper method, meant to allow us to simulate the missing feature of anonymous iterators in C#:

public static IEnumerable<T> Generate<T>(Func<Func<Tuple<T>>> generator)
{
    var tryGetNext = generator();
    while (true)
    {
        var result = tryGetNext();
        if (null == result)
        {
            yield break;
        }
        yield return result.Item1;
    }
}

Generate is like an aggregator with state. It accepts a function that returns initial state, and a generator function that would have been an anonymous with yield return in it, if it were allowed in C#. The state returned by initialize is meant to be per-enumeration, while a more global state (shared between all enumerations) can be maintained by the caller to Generate e.g. in closure variables as we'll show below.

Now we can use this for the "buffered Enumerable" problem:

public static IEnumerable<T> Cached<T>(IEnumerable<T> enumerable)
{
    var cache = new List<T>();
    var enumerator = enumerable.GetEnumerator();

    return Generate<T>(() =>
    {
        int pos = -1;
        return () => {
            pos += 1;
            if (pos < cache.Count())
            {
                return new Tuple<T>(cache[pos]);
            }
            if (enumerator.MoveNext())
            {
                cache.Add(enumerator.Current);
                return new Tuple<T>(enumerator.Current);
            }
            return null;
        };
    });
}

edited May 23 '17 at 12:03

Community

1
1

answered Nov 07 '13 at 05:54

sinelaw

16,205
3
49
80

Thanks for this one *sinelaw*. :) +1 – Pragmateek Nov 07 '13 at 10:02
The use of `Tuple` as an optional `T` is actually something I had never thought of before. A great trick for sure. +1 – Timothy Shields Nov 07 '13 at 22:49
@TimothyShields Hmm, I don't think that's such a good trick - it's somewhat misleading. If you want and optional value, why make the (trivial) class `OptionalValue` or `OptionalReference` - well-chosen names help code maintainability. – Eamon Nerbonne Nov 08 '13 at 10:11
@sinelaw: I like the idea, but you're being unnecessarily creative with your parameter passing: you can avoid the "reference to int via array" trick using a closure (i.e. Generate paratemer could be `Func>` then); and you might want to name the concept of the generator state (i.e. Generate parameter could be be `Func>`. – Eamon Nerbonne Nov 08 '13 at 10:17
@EamonNerbonne regarding Tuple vs. OptionalValue - Ideally .NET would supply a type for this – sinelaw Nov 08 '13 at 15:33
@EamonNerbonne thanks for the suggestion, I updated the code using a closure and got rid of the array. There is an advantage to passing an explicit initialize func - it allows you to decouple the generator function from the initial state – sinelaw Nov 08 '13 at 15:45
Right, but is that useful? You can always do `()=>var x=decoupled();return ()=>{...}`, and almost always the initializer and generator will be intrinsically coupled. Many generators won't even need an initializer beyond trivial parameterization. – Eamon Nerbonne Nov 08 '13 at 22:10
Whoops, meant `()=>{var x=decoupled();return ()=>{...}}` – Eamon Nerbonne Nov 08 '13 at 22:23
Back to the `OptionalType`, it would be nice if .NET included such a type - but I still think it's more readable to include a trivial type with obvious meaning rather than use a `Tuple` in an unusual way, especially because tuples in C# create so much line noise (`Item1`, all those ``'s...), and especially because you're using it in a way that differs quite a lot from a normal functional-language tuple (which isn't typically nullable). Having said that, there's no denying that writing that kind of boilerplate in a demo like this isn't pretty, and tuple does avoid that :-). – Eamon Nerbonne Nov 08 '13 at 23:03
Somewhat amusingly, this answer concerning "temporary" types for return values that uses `Tuple` makes the same observation: http://stackoverflow.com/questions/9811398/is-there-a-built-in-type-in-c-sharp-that-contains-a-bool-and-a-string-for-static – Eamon Nerbonne Nov 08 '13 at 23:12
Finally, note that if you did use a special-purpose type like `struct OptionalValue {public bool HasValue; public T Value;}` you could even gain a little efficiency by avoiding GC pressure, albeit minor. – Eamon Nerbonne Nov 08 '13 at 23:13
I must be coming across as a deranged nitpicker, so for some nuance here: I like your answer, I'm just trying to find ways to make functional approaches like this readable and maintainable in C#; it's just not always as easy as you might expect despite LINQ and related improvements. – Eamon Nerbonne Nov 08 '13 at 23:15
@EamonNerbonne, thanks for all the input - I really appreciate it! I agree about the OptionalValue - this is demo code as you say. You're also right that a closure can be used to wrap an initialization function, which is why I already changed the code to get rid of the two functions. Incidentally the one-tuple doesn't make much sense anyway (is there *any* sensible reason to use it?) – sinelaw Nov 11 '13 at 01:29
I sure can't think of one in normal code. It does have a certain symmetry which may be useful if you're generating code or somehow generically working on an arbitrary (but fixed) number of variables. Of course, C# doesn't itself provide support for that, and in any case, then I'd expect the empty tuple to be present to, for completeness. So really, no, it doesn't make much sense. – Eamon Nerbonne Nov 11 '13 at 12:26
1

Nice answer, thanks. I started to use this code as a jumping off point, and was writing some tests for it. My test exposed the fact that 'MoveNext' is called on the original enumerator once for each re-use of the buffered results (when the 'end' is reached). This will almost never be a problem as you'd imagine most implementations of IEnumerator will have some state and know they are finished, but I'm not sure if that's guaranteed. If the intention is to replay *exactly* what happened the first time then there should arguably be another state variable in the closure, e.g. `bool completed` – OlduwanSteve Jan 02 '15 at 13:08
Actually.. setting `enumerator` to null on completion can serve as both an indicator of completeness and free the original enumerator for garbage collection. You'd obviously need to check for that null : `if (enumerator != null && enumerator.MoveNext())` – OlduwanSteve Jan 02 '15 at 13:12

score 4 · Answer 5 · answered Nov 06 '13 at 23:25

As far as I know there is no built-in way to do this, which - now that you mention it - is slightly surprising (my guess is, given the frequency with which one would want to use this option, it was probably not worth the effort needed to analyse the code to make sure that the generator gives the exact same sequence every time).

You can however implement it yourself. The easy way would be on the call-site, as

var evenNumbers = Numbers.Where(i => i % 2 == 0).
var startOfList = evenNumbers.Take(10).ToList();

// use startOfList instead of evenNumbers in the loop.

More generally and accurately, you could do it in the generator: create a List<int> cache and every time you generate a new number add it to the cache before you yield return it. Then when you loop through again, first serve up all the cached numbers. E.g.

List<int> cachedEvenNumbers = new List<int>();
IEnumerable<int> EvenNumbers
{
  get
  {
    int i = -1;

    foreach(int cached in cachedEvenNumbers)
    {
      i = cached;
      yield return cached;
    }

    // Note: this while loop now starts from the last cached value
    while (true) 
    {
        Console.WriteLine("Generating {0}.", i + 1);
        yield return ++i;
    }
  }
}

I guess if you think about this long enough you could come up with a general implementation of a IEnumerable<T>.Buffered() extension method - again, the requirement is that the enumeration doesn't change between calls and the question is if it is worth it.

My answer supplies the general-purpose "`Buffered`" method you're talking about. — Timothy Shields, Nov 06 '13 at 23:30
Thanks for your answer *CompuChip*, and yes this is a generic solution that I'm seeking. Anyway +1. :) — Pragmateek, Nov 06 '13 at 23:34
@TimothyShields I see you edited your answer after I posted mine. Nice one, thanks! — CompuChip, Nov 07 '13 at 06:38

score 3 · Answer 6 · edited May 23 '17 at 11:54

3

Here's an incomplete yet compact 'functional' implementation (no new types defined).

The bug is that it does not allow simultaneous enumeration.

Original description: The first function should have been an anonymous lambda inside the second, but C# does not allow yield in anonymous lambdas:

// put these in some extensions class

private static IEnumerable<T> EnumerateAndCache<T>(IEnumerator<T> enumerator, List<T> cache)
{
    while (enumerator.MoveNext())
    {
        var current = enumerator.Current;
        cache.Add(current);
        yield return current;
    }
}
public static IEnumerable<T> ToCachedEnumerable<T>(this IEnumerable<T> enumerable)
{
    var enumerator = enumerable.GetEnumerator();
    var cache = new List<T>();
    return cache.Concat(EnumerateAndCache(enumerator, cache));
}

Usage:

var enumerable = Numbers.ToCachedEnumerable();

edited May 23 '17 at 11:54

Community

1
1

answered Nov 06 '13 at 23:56

sinelaw

16,205
3
49
80

This is buggy: it doesn't support multiple simultaneous iterations. E.g. `cached.ZipWith(cached.Skip(1), Tuple.Create)` would crash - and note that that's a particularly interesting case to support because caching that simultaneously ensures the list is evaluated just once, but it's also lazy. – Eamon Nerbonne Nov 07 '13 at 00:02
Also, there's no need for the double-nested func's - you're evaluating em right away anyhow. – Eamon Nerbonne Nov 07 '13 at 00:04
Oops, that double anonymous lambda slipped thorugh. Fixed. – sinelaw Nov 07 '13 at 00:11
You're also right about the bug. I'll leave this answer as a "how not to do it" – sinelaw Nov 07 '13 at 00:14

score 3 · Answer 7 · edited May 23 '17 at 12:26

Full credit to Eamon Nerbonne and sinelaw for their answers, just a couple of tweaks! First, to release the enumerator when it is completed. Secondly to protect the underlying enumerator with a lock so the enumerable can be safely used on multiple threads.

// This is just the same as @sinelaw's Generator but I didn't like the name
public static IEnumerable<T> AnonymousIterator<T>(Func<Func<Tuple<T>>> generator)
{
    var tryGetNext = generator();
    while (true)
    {
        var result = tryGetNext();
        if (null == result)
        {
            yield break;
        }
        yield return result.Item1;
    }
}

// Cached/Buffered/Replay behaviour
public static IEnumerable<T> Buffer<T>(this IEnumerable<T> self)
{
    // Rows are stored here when they've been fetched once
    var cache = new List<T>();

    // This counter is thread-safe in that it is incremented after the item has been added to the list,
    // hence it will never give a false positive. It may give a false negative, but that falls through
    // to the code which takes the lock so it's ok.
    var count = 0;

    // The enumerator is retained until it completes, then it is discarded.
    var enumerator = self.GetEnumerator();

    // This lock protects the enumerator only. The enumerable could be used on multiple threads
    // and the enumerator would then be shared among them, but enumerators are inherently not
    // thread-safe so a) we must protect that with a lock and b) we don't need to try and be
    // thread-safe in our own enumerator
    var lockObject = new object();

    return AnonymousIterator<T>(() =>
    {
        int pos = -1;
        return () =>
        {
            pos += 1;
            if (pos < count)
            {
                return new Tuple<T>(cache[pos]);
            }
            // Only take the lock when we need to
            lock (lockObject)
            {
                // The counter could have been updated between the check above and this one,
                // so now we have the lock we must check again
                if (pos < count)
                {
                    return new Tuple<T>(cache[pos]);
                }

                // Enumerator is set to null when it has completed
                if (enumerator != null)
                {
                    if (enumerator.MoveNext())
                    {
                        cache.Add(enumerator.Current);
                        count += 1;
                        return new Tuple<T>(enumerator.Current);
                    }
                    else
                    {
                        enumerator = null;
                    }
                }
            }
        }
        return null;
    };
});

}

There is a race condition which keeps this code from being thread safe. Two threads try to get the last item in the list. Thread A checks `pos < count` to see if there's a cached result for it; there isn't. Thread B checks `pos < count` to see if there's a cached result for it; there isn't. Thread B moves to the last item and returns it. Thread B tries to get the next item, encounters the end of the list, and sets `enumerator=null`. Thread A checks `enumerator != null`, sees that it is `null` and `return null` instead of returning the last item. — Cirdec, Jun 12 '15 at 00:27
You were right there was, thanks! I've edited the code to remove the outer check on the enumerator, which I think resolves the problem. Do you agree? — OlduwanSteve, Jun 12 '15 at 09:00

Fidel · Answer 8 · 2021-12-17T08:52:53.933

0

I use the following extension method.

This way, the input is read at maximum speed, and the consumer processes at maximum speed.

public static IEnumerable<T> Buffer<T>(this IEnumerable<T> input)
{
    var blockingCollection = new BlockingCollection<T>();

    //read from the input
    Task.Factory.StartNew(() =>
    {
        foreach (var item in input)
        {
            blockingCollection.Add(item);
        }

        blockingCollection.CompleteAdding();
    });

    foreach (var item in blockingCollection.GetConsumingEnumerable())
    {
        yield return item;
    }
}

Example Usage

This example has a fast producer (find files), and a slow consumer (upload files).

long uploaded = 0;
long total = 0;

Directory
    .EnumerateFiles(inputFolder, "*.jpg", SearchOption.AllDirectories)
    .Select(filename =>
    {
        total++;
        return filename;
    })
    .Buffer()
    .ForEach(filename =>
    {
        //pretend to do something slow, like upload the file.
        Thread.Sleep(1000);
        uploaded++;

        Console.WriteLine($"Uploaded {uploaded:N0}/{total:N0}");
    });

edited Dec 17 '21 at 08:52

answered Dec 17 '21 at 08:17

Fidel

7,027
11
57
81

Have you measured this to see if your assertion is correct? My experience with a `ConcurrentQueue` is that the locking will make this much slower. – Enigmativity Dec 17 '21 at 08:25
This also will ramp up the CPU. The `yield return` loop just spins on the CPU if the `input` is slow. – Enigmativity Dec 17 '21 at 08:27
Thanks @Enigmativity, I changed it from `ConcurrentQueue` to a `BlockingCollection` – Fidel Dec 17 '21 at 08:41
Sorry, any form of concurrent or blocking collection is the same. – Enigmativity Dec 17 '21 at 10:12

Buffering a LINQ query

8 Answers8

IEnumerable<T>.Buffer() extension method

Usage

Comments

`IEnumerable<T>.Buffer()` extension method