0

I am trying to batch the IEnumerable<T> in equal subsets and came across following solutions:

  1. MoreLinq Nuget library Batch, whose implementation is detailed here:

    MoreLinq - Batch, pasting source code underneath:

    public static IEnumerable<TResult> Batch<TSource, TResult>(this   
      IEnumerable<TSource> source, int size,
            Func<IEnumerable<TSource>, TResult> resultSelector)
     {
        if (source == null) throw new ArgumentNullException(nameof(source));
        if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
        if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
          return BatchImpl(source, size, resultSelector);
     }
    
    private static IEnumerable<TResult> BatchImpl<TSource, TResult> (this IEnumerable<TSource> source, int       
              size,Func<IEnumerable<TSource>, TResult> resultSelector)
    {
        Debug.Assert(source != null);
        Debug.Assert(size > 0);
        Debug.Assert(resultSelector != null);
    
       TSource[] bucket = null;
       var count = 0;
    
    foreach (var item in source)
    {
        if (bucket == null)
        {
            bucket = new TSource[size];
        }
    
        bucket[count++] = item;
    
        // The bucket is fully buffered before it's yielded
        if (count != size)
        {
            continue;
        }
    
        // Select is necessary so bucket contents are streamed too
        yield return resultSelector(bucket);
    
        bucket = null;
        count = 0;
    }
    
    // Return the last bucket with all remaining elements
    if (bucket != null && count > 0)
    {
        Array.Resize(ref bucket, count);
            yield return resultSelector(bucket);
    }
    }
    
  2. Another optimal solution is available on the following link (more memory efficient):

    IEnumerable Batching, pasting source code underneath:

    public static class BatchLinq
    {
         public static IEnumerable<IEnumerable<T>> CustomBatch<T>(this IEnumerable<T> source, int size)
        {
          if (size <= 0)
            throw new ArgumentOutOfRangeException("size", "Must be greater than zero.");
    
         using (IEnumerator<T> enumerator = source.GetEnumerator())
            while (enumerator.MoveNext())
               yield return TakeIEnumerator(enumerator, size);
       }
    
       private static IEnumerable<T> TakeIEnumerator<T>(IEnumerator<T> source, int size)
      {
          int i = 0;
          do
              yield return source.Current;
          while (++i < size && source.MoveNext());
      }
    }
    

Both the solutions provide the end result as IEnumerable<IEnumerable<T>>.

I find the discrepancy in the following piece of code:

var result = Fetch IEnumerable<IEnumerable<T>> from either method suggested above

result.Count(), leads to different result, its correct for MoreLinq Batch, but not correct for other one, even when the Result is correct and same for both

Consider the follwing example:

IEnumerable<int> arr = new int[10] {1,2,3,4,5,6,7,8,9,10};

For a Partition size 3

arr.Batch(3).Count(), will provide result 4 which is correct

arr.BatchLinq(3).Count(), will provide result 10 which is incorrect

Even when the batching result provided is correct, when we do ToList(), is the error since we are still dealing with the memory stream in the second method and memory is not allocated, but still incorrect result shall not be the case, Any views / suggestions

Community
  • 1
  • 1
Mrinal Kamboj
  • 11,300
  • 5
  • 40
  • 74
  • 1
    I think you need to share the code you are executing. –  Mar 26 '17 at 07:16
  • If you look at the question carefully, code is right there, until and unless you prefer to copy source code from respective links too. What part do you think is missing / unclear, both the batching mechanisms are IEnumerable extensions – Mrinal Kamboj Mar 26 '17 at 07:18
  • @Veverke No batching works, that's the interesting part, as I mentioned it shows the correct result on doing `ToList()`, but `Count()` is not correct. Also Incorrect code is the Stack overflow question answer, not related to MoreLinq, I prefer that answer due to its optimization but cannot point out the reason for the issue – Mrinal Kamboj Mar 26 '17 at 07:27
  • 1
    You have a block of pseudocode which you say is "right there". If you want help, show the C# code for creating the partitions. –  Mar 26 '17 at 08:01
  • Point 1 and 2 in the question had two links, which pointed to the exact source code, which i have now pasted, if that can really help – Mrinal Kamboj Mar 26 '17 at 13:57
  • 1
    The reason why second result return Count=10 is because it uses `while (enumerator.MoveNext())` which will yield 10 times and I assume will return 7 extra empty enumerables. In what form do you want to see the answer to this question? – Andrii Litvinov Mar 26 '17 at 14:19
  • @AndriiLitvinov that means its incorrect, not in line with what Count() shall do – Mrinal Kamboj Mar 26 '17 at 14:31
  • Yes, that's correct, Count is wrong. What is the purpose if this question? – Andrii Litvinov Mar 26 '17 at 14:45
  • Purpose was whether I can reliably use this code, which is more optimized but buggy, thanks for verification, you may answer the question, I would mark it – Mrinal Kamboj Mar 26 '17 at 15:05

1 Answers1

1

The reason why second result return Count=10 is because it uses while (enumerator.MoveNext()) which will yield 10 times and causes resulting enumerable to contain 10 enumerables instead of 3.

Answer with higher score https://stackoverflow.com/a/13731854/2138959 in referenced question provided reasonable solution to the problem as well.

Community
  • 1
  • 1
Andrii Litvinov
  • 12,402
  • 3
  • 52
  • 59