2

I've got a database entity type Entity, a long list of Thingy and method

private Task<Entity> MakeEntity(Thingy thingy) {
  ...
}

MakeEntity does lots of stuff, and is CPU bound. I would like to convert all my thingies to entities, and save them in a db.context. Considering that

  • I don't want to finish as fast as possible
  • The amount of entities is large, and I want to effectively use the database, so I want to start saving changes and waiting for the remote database to do it's thing

how can I do this performantly? What I would really like is to loop while waiting for the database to do its thing, and offer all the newly made entities so far, untill the database has processed them all. What's the best route there? I've run in to saveChanges throwing if it's called concurrently, so I can't do that. What I'd really like is to have a threadpool of eight threads (or rather, as many threads as I have cores) to do the CPU bound work, and a single thread doing the SaveChanges()

Martijn
  • 11,964
  • 12
  • 50
  • 96
  • 1
    Are you sure you *want* to multithread on ASP.NET? Remember, when a single request is using multiple threads, that will significantly impact your scalability since those threads can't be used for other requests. – Stephen Cleary Nov 14 '14 at 17:03
  • "a long list of Thingy". Are you invoking `MakeEntity` repeatedly with each `Thingy` instance? – Asad Saeeduddin Nov 14 '14 at 17:03
  • Also, is there no way to batch your CPU bound processing of a bunch of `Thingy`s and attach the resulting `Entity`s to the context, **then** `SaveChanges` for only one DB round trip? – Asad Saeeduddin Nov 14 '14 at 17:05
  • @Asad MakeEntity should be invoked for every Thingy, yes (but how is really part of my question. Calling SaveChanges once for the entire batch would be by far best, but due to the amount of work required there not feasible. The main thing I want to avoid is the independent work of `MakeEntity` and waiting for the intermediate result of `SaveChanges` while taking care that there are no multiple SaveChanges "in flight" simultaniously as that's not supported by EF. – Martijn Nov 15 '14 at 14:22

2 Answers2

2

You could set up a pipeline of N CPU workers feeding into a database worker. The database worker could batch items up.

Since MakeEntity is CPU bound there is no need to use async and await there. await does not create tasks or threads (a common misconception).

var thingies = ...;
var entities = thingies.AsParallel().WithDOP(8).Select(MakeEntity);
var batches = CreateBatches(entities, batchSize: 100);

foreach (var batch in batches) {
 Insert(batch);
}

You need to provide a method that creates batches from an IEnumerable. This is available on the web.

If you don't need batching for the database part you can delete that code.

For the database part you probably don't need async IO because it seems to be a low-frequency operation.

Community
  • 1
  • 1
usr
  • 168,620
  • 35
  • 240
  • 369
2

This is a kind of "asynchronous stream", which is always a bit awkward.

In this case (assuming you really do want to multithread on ASP.NET, which is not recommended in general), I'd say TPL Dataflow is your best option. You can use a TransformBlock with MaxDegreeOfParallelism set to 8 (or unbounded, for that matter), and link it to an ActionBlock that does the SaveChanges.

Remember, use synchronous signatures (not async/await) for CPU-bound code, and asynchronous methods for I/O-bound code (i.e., SaveChangesAsync).

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810