-1

I'm sure this has a simple fix. When I have the breakpoint on TaskList.Add() I can see that different batchDeletionRecords are being passed on to the method, but in the BatchProcess Method, it's accepting repetitive values.

When I added the arg variable I was able to get proper values for iBatch. (Saw it here) Didn't work for DataTable though.

Batcher.Batch() is supposed to provide batch DataTables for the deletionRecords in the batch of 2. (This thing can be ignored)

var iBatch = 0;
foreach (DataTable batchDeletionRecords in Batcher.Batch(deletionRecords, 2))
{
    iBatch++;
    var arg = new { _batchDeletionRecords = batchDeletionRecords, _iBatch = iBatch };
    TaskList.Add(Task.Run(new Action(() => BatchProcess(arg._batchDeletionRecords.Copy(), arg._iBatch))));
}

enter image description here

BatchProcess() is a simple method that deletes and prints the values onto the console.

Peter Csala
  • 17,736
  • 16
  • 35
  • 75
Kunal Kene
  • 49
  • 1
  • 1
  • 8
  • https://stackoverflow.com/questions/271440/captured-variable-in-a-loop-in-c-sharp ? – Caius Jard Mar 12 '21 at 07:02
  • But doesn’t batch #2 have different records? Are you certain batch #3 contains uniques? – Caius Jard Mar 12 '21 at 07:04
  • Does `BatchProcess` execute I/O-bound operation? If so then why don't you convert it to async? – Peter Csala Mar 12 '21 at 07:13
  • Yes, Batch #2, #3, and others have different records. 100% Certain that BatchProcess is working just as it should. – Kunal Kene Mar 12 '21 at 07:20
  • 1
    Most probably the `Batcher.Batch` method is buggy. You could consider using instead the [`Batch`](https://morelinq.github.io/2.6/ref/api/html/Overload_MoreLinq_MoreEnumerable_Batch.htm) operator from the [MoreLinq](https://github.com/morelinq/MoreLINQ) library ([package](https://www.nuget.org/packages/morelinq/)). – Theodor Zoulias Mar 12 '21 at 13:27
  • @TheodorZoulias The issue was with the data passing of DataTables, Batcher.Batch() is being used in multiple places and is a self-written class and method which is working as expected. – Kunal Kene Mar 12 '21 at 14:07
  • 1
    @KunalKene no it's not, otherwise you wouldn't have this problem. The safest code is code you don't have to write, code created, verified and supported by the .NET team (Partitioner) or developers like Jon Skeet (MoreLINQ). – Panagiotis Kanavos Mar 12 '21 at 16:38
  • If you explain the actual problem, people could point out that for API batching and throttling, you can use the DataFlow classes to process an infinite stream of messages, batch them using a [BatchBlock](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.batchblock-1?view=net-5.0), make the API calls using only 2 or 3 concurrent connections with eg a [TransformBlock](https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.dataflow.transformblock-2?view=net-5.0) with a DOP=3 and process the responses with an ActionBlock – Panagiotis Kanavos Mar 12 '21 at 16:40

1 Answers1

0

Okay, I was able to solve this by adding the Copy() at arg level rather than Task.Run() level passing parameters.

var iBatch = 0;
foreach (DataTable batchDeletionRecords in Batcher.Batch(deletionRecords, 2))
{
    iBatch++;
    var arg = new { _batchDeletionRecords = batchDeletionRecords.Copy(), _iBatch = iBatch };
    TaskList.Add(Task.Run(new Action(() => BatchProcess(arg._batchDeletionRecords, arg._iBatch))));
}
Kunal Kene
  • 49
  • 1
  • 1
  • 8
  • 1
    Solved *what*? You haven't provided the code for these methods so it's unclear why copying is needed. Reading from global data that doesn't change is thread-safe. If you encounter problems it's because the code you didn't post modifies the state. In fact, entire code is probably not needed - you could use `Parallel.ForEach` to process lots of data. `Parallel.ForEach` takes care of partitioning itself. – Panagiotis Kanavos Mar 12 '21 at 07:29
  • On the other hand, trying to delete 100K records in parallel won't run faster than trying to delete them sequentially. You're talking to the same database, over the same network, using the same CPUs, the same disk. It's not *these* that cause delays when you try to delete 100K recods, it's having to lock all of the records and write all changes to the transaction log. The fastest way to delete all data, by a factor of .... almost infinity, is to use table partitioning and simply swap out or drop a partition. Swapping out a partition is a metadata operation taking milliseconds. – Panagiotis Kanavos Mar 12 '21 at 07:40
  • Also, I should clarify, I'm not deleting these records from a SQL or NoSQL DB, I'm using a proprietary API that takes in requests and returns the status of deletion. @Panagiotis. Thanks for the suggestion anyway. – Kunal Kene Mar 12 '21 at 11:38
  • Solved the issue I was facing: " I can see that different batchDeletionRecords are being passed on to the method, but in the BatchProcess Method, it's accepting repetitive values." Now that the copy is being passed, I'm not facing a reference variable issue for DataTable. – Kunal Kene Mar 12 '21 at 11:40
  • If you saw repetitive values it means `Batcher.Batch(deletionRecords, 2)` had a bug. Most likely it's returning the same object with different values instead of a new object. To solve this ensure `Batcher.Batch` returns a new instance each time. OR get rid off `Batcher` and use `Parallel.ForEach`. It will do the batching itself. Or use .NET's [Partitioner](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.partitioner.create?view=net-5.0) class – Panagiotis Kanavos Mar 12 '21 at 12:13
  • @Panagiotis, please read the questions before asking clarifying questions. I had specified, Batcher.Batch() is working as expected and the thread is resolved. "Batcher.Batch() is supposed to provide batch DataTables for the deletionRecords in the batch of 2. (This thing can be ignored)" – Kunal Kene Mar 12 '21 at 14:05
  • And it doesn't do its job. The only way `Copy()` could help is if `Batch` returned the same instance over and over. `Copy` only covers up this bug, it doesn't fix it. You can avoid this bug and eliminate your custom code by using `var partitioner=Parittioner.Create(deletionRecords); var batches=partitioner.CreatePartitions(2);` – Panagiotis Kanavos Mar 12 '21 at 16:30
  • Another option is to use the [MoreLINQ](https://github.com/morelinq/MoreLINQ) library's [Batch](https://morelinq.github.io/3.2/ref/api/html/M_MoreLinq_MoreEnumerable_Batch__1.htm) method : `var batches=deletionRecords.Batch(10);` – Panagiotis Kanavos Mar 12 '21 at 16:36
  • @PanagiotisKanavos I am not a fan of the [`Partitioner`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.partitioner) class in general, except for doing some basic stuff with the `Parallel.ForEach` method or with PLINQ. I find its API non intuitive and too unconventional and idiomatic (returning an `IList` of `IEnumerator`s, really?). I would not consider using one for batching. The MoreLinq's `Batch`, or the `Buffer` operator from the System.Interactive package, or the dataflow `BatchBlock` are much preferable IMHO. – Theodor Zoulias Mar 12 '21 at 17:55
  • @TheodorZoulias `Parallel.ForEach` and PLINQ use it just fine. That class was specifically made to partition data for parallel processing. There's a reason it returns an `IList` - the number of partitions is already known and there's no reason to iterate to create workers for the partitions. MoreLINQ's `Batch` does something *very* different - it creates batches of specific sizes, not partitions. It's essentially the reverse of partitoning – Panagiotis Kanavos Mar 12 '21 at 18:10
  • @TheodorZoulias besides, [Partitioner.CreateDynamicPartitions](https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.orderablepartitioner-1.getdynamicpartitions?view=net-5.0) does return an IEnumerable of IEnumerables, for cases where the number of partitions needs to change dynamically – Panagiotis Kanavos Mar 12 '21 at 18:16
  • @PanagiotisKanavos exactly, the `Partitioner` is not for batching. Although is has a `Create` overload that accepts a `int rangeSize` parameter, and can be used this way when your source is a list or array. Btw AFAIK the `Partitioner` class does not have a `CreatePartitions` method that you mentioned earlier, and this exemplifies how difficult is to remember its API (and remember how to use it correctly). – Theodor Zoulias Mar 12 '21 at 18:24