I am trying to batch the IEnumerable<T>
in equal subsets and came across following solutions:
MoreLinq Nuget library Batch, whose implementation is detailed here:
MoreLinq - Batch, pasting source code underneath:
public static IEnumerable<TResult> Batch<TSource, TResult>(this IEnumerable<TSource> source, int size, Func<IEnumerable<TSource>, TResult> resultSelector) { if (source == null) throw new ArgumentNullException(nameof(source)); if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size)); if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector)); return BatchImpl(source, size, resultSelector); } private static IEnumerable<TResult> BatchImpl<TSource, TResult> (this IEnumerable<TSource> source, int size,Func<IEnumerable<TSource>, TResult> resultSelector) { Debug.Assert(source != null); Debug.Assert(size > 0); Debug.Assert(resultSelector != null); TSource[] bucket = null; var count = 0; foreach (var item in source) { if (bucket == null) { bucket = new TSource[size]; } bucket[count++] = item; // The bucket is fully buffered before it's yielded if (count != size) { continue; } // Select is necessary so bucket contents are streamed too yield return resultSelector(bucket); bucket = null; count = 0; } // Return the last bucket with all remaining elements if (bucket != null && count > 0) { Array.Resize(ref bucket, count); yield return resultSelector(bucket); } }
Another optimal solution is available on the following link (more memory efficient):
IEnumerable Batching, pasting source code underneath:
public static class BatchLinq { public static IEnumerable<IEnumerable<T>> CustomBatch<T>(this IEnumerable<T> source, int size) { if (size <= 0) throw new ArgumentOutOfRangeException("size", "Must be greater than zero."); using (IEnumerator<T> enumerator = source.GetEnumerator()) while (enumerator.MoveNext()) yield return TakeIEnumerator(enumerator, size); } private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size) { int i = 0; do yield return source.Current; while (++i < size && source.MoveNext()); } }
Both the solutions provide the end result as IEnumerable<IEnumerable<T>>
.
I find the discrepancy in the following piece of code:
var result = Fetch IEnumerable<IEnumerable<T>>
from either method suggested above
result.Count()
, leads to different result, its correct for MoreLinq Batch, but not correct for other one, even when the Result is correct and same for both
Consider the follwing example:
IEnumerable<int> arr = new int[10] {1,2,3,4,5,6,7,8,9,10};
For a Partition size 3
arr.Batch(3).Count(), will provide result 4 which is correct
arr.BatchLinq(3).Count(), will provide result 10 which is incorrect
Even when the batching result provided is correct, when we do ToList()
, is the error since we are still dealing with the memory stream in the second method and memory is not allocated, but still incorrect result shall not be the case, Any views / suggestions