Enumerable.Take
does in fact stream results out; it doesn't buffer up its source entirely and then return only the first N. Looking at your original solution though, the problem is that the input to where you would want to do a Take
is String.Split
. Unfortunately, this method doesn't use any sort of deferred execution; it eagerly creates an array of all the 'splits' and then returns it.
Consequently, the technique to get a streaming sequence of words from some text would be something like:
var words = src.StreamingSplit() // you'll have to implement that
.Take(1000);
However, I do note that the rest of your query is:
...
.GroupBy(str => str) // group words by the value
.Select(g => new
{
str = g.Key, // the value
count = g.Count() // the count of that value
});
Do note that GroupBy
is a buffering operation - you can expect that all of the 1,000 words from its source will end up getting stored somewhere in the process of the groups being piped out.
As I see it, the options are:
- If you don't mind going through all of the text for splitting purposes, then
src.Split().Take(1000)
is fine. The downside is wasted time (to continue splitting after it is no longer necesary) and wasted space (to store all of the words in an array even though only the first 1,000) will be needed. However, the rest of the query will not operate on any more words than necessary.
- If you can't afford to do (1) because of time / memory constraints, go with
src.StreamingSplit().Take(1000)
or equivalent. In this case, none of the original text will be processed after 1,000 words have been found.
Do note that those 1,000 words themselves will end up getting buffered by the GroupBy
clause in both cases.