-1

I have a lot of rows (300k+) to upsert into SQL server database in a shortest possible period of time, so the idea was to use parallelization and partition the data and use async to pump the data into SQL, X threads at the time, 100 rows per context, with context being recycled to minimize tracking overhead. However, that means more than one connection is to be used in parallel and thus CommittableTransaction/TransactionScope would use distributed transaction which would cause parallelized transaction enlistment operation to return the infamous "This platform does not support distributed transactions." exception.

I do need the ability to commit/rollback the entire set of upserts. Its part of the batch upload process and any error should rollback the changes to previously working/stable condition, application wise.

What are my options? Short of using one connection and no parallelization?

Note: Problem is not so simple as a batch of insert commands, if that was the case, I would just generate inserts and run them on server as query or indeed use SqlBulkCopy. About half of them are updates, half are inserts where new keys are generated by SQL Server which need to be obtained and re-keyed on child objects which would be inserted next, rows are spread over about a dozen tables in a 3-level hierarchy.

mmix
  • 6,057
  • 3
  • 39
  • 65
  • One big thing you can do to create options is to switch from IDENTITY to Sequences for key generation. Sequences support client-side key generation, and so you can key the entire object graph. Then you can bulk load or stage/merge the tables in foreign-key order. See https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-sequence-get-range-transact-sql?view=sql-server-ver15 – David Browne - Microsoft Apr 29 '20 at 13:05
  • Is there a way to do this with table keys and combine this with `IDENTITY_INSERT ON`? – mmix Apr 29 '20 at 13:33
  • EF Core supports sequences, but they are set as column defaults in SQL Server, not IDENTITY columns. So you can generate keys ahead of time if you want. The problem with IDENTITY_INSERT (well there are a couple) but the big one here is that there's no way to generate the key values ahead of time. See https://learn.microsoft.com/en-us/ef/core/modeling/sequences – David Browne - Microsoft Apr 29 '20 at 13:37
  • That's a good tip, thanks, I'll definitely keep this in mind for future databases. However, this one is not under our control, its a production db and schema change is a no go, especially since there are other systems using it. Risk mgmt would chop my head off :) – mmix Apr 29 '20 at 14:13

1 Answers1

5

Nope. Totally wrong approach. Do NOT use EF for that - bulk insert ETL is not what Object Relational Mappers are made for and a lot of design decisions are not productive for that. You would also not use a small car instead of a truck to transport 20 tons of goods.

300k rows are trivial if you use SqlBulkCopy API in some sort.

Tommy
  • 39,592
  • 10
  • 90
  • 121
TomTom
  • 61,059
  • 10
  • 88
  • 148
  • Agree. Know your toolset(s) and their limitations. EF, while great at many things, is not so great at massive dataset inserts/updates. Don't be scared to use a bit of actual SQL when needed. – Tommy Apr 29 '20 at 12:49
  • EF keeps me strongly typed, given what I have to endure, I'll take any help I can. Problem is not so simple as a batch of insert commands, if that was the case, I would just generate inserts and run them on server as query. About half of them are updates, half are inserts where new keys need to be obtained and re-keyed on child objects, rows are spread over about a dozen tables in a 3-level hierarchy. SqlBulkCopy doesnt help me much, just adds a layer of magic strings complexity. – mmix Apr 29 '20 at 12:53
  • I wrote a strong typed mapper around SqlBulkCopy in about half a day - so that is NOT an excuse. It is also trivial to load this into temp tables (which is what I do) before insert. NO magic strings, all acutally based on reflections and Ef used attributes ;) – TomTom Apr 29 '20 at 12:54
  • @mmix - That's one of the things EF is great for, keeping you strongly typed. But you are going to run into massive issues trying to get EF to do 300K rows in some 'performant' amount of time. Have you seen/investigated MERGE statements in SQL? https://learn.microsoft.com/en-us/sql/t-sql/statements/merge-transact-sql?view=sql-server-ver15 – Tommy Apr 29 '20 at 12:54
  • Also EF core does batching. It's not SqlBulkCopy, but you should at least test the single-DbContext naive EF approach. Another option is to send the object graph to SQL as JSON and run INSERT/MERGE queries reading from the JSON. But in any case running from multiple client sessions is unlikely to help. – David Browne - Microsoft Apr 29 '20 at 12:56
  • @TomTom, strong-typed wrapper around SqlBulkCopy is just obfuscation of magic strings problem, it doesn't go away, once the model (fom database-first) is re-scaffolded it will break. Add allows me generalized access to context. i do not have a problem to invest time, as so long as it solves the problem. However, it does not for multi-level inserts with auto-gen keys. ef helps me in a way by retrieveing keys back, which sqlbulkcopy does not support. (https://stackoverflow.com/questions/21687536/how-to-retrieve-server-generated-identity-values-when-using-sqlbulkcopy) – mmix Apr 29 '20 at 13:02
  • Nope. Not unles you consider EfCore ALSO obfuscation of magic strings. I use the same classes and metadata EfCore used and dynamically generate the strings. I use a generic Bulk class that internally uses SqlBulkCopy - much like Ef uses internally a non typed class. INTERNALLY. – TomTom Apr 29 '20 at 13:10
  • EFcore is indeed an obfuscator of magic strings related to SQL names (properties are mapped to strings), its unavoidable, however, its done by a scaffold tool in a consistent manner. As I said, I do not mistrust you or avoid magic strings on academic grounds, I just do not see it applicable because of keygen. – mmix Apr 29 '20 at 13:13
  • Again, thank you, but unless they keygen aspect is covered, and its not, its a no-go. – mmix Apr 29 '20 at 13:32
  • Actually they are covered. You seem to think that an insert or update is to be done - this is not how ETL has been done the last 60 years. Load into TEMPORARY tables and then MERGE into the target tables. – TomTom Apr 29 '20 at 13:50
  • And I link them to their children? And their children to their children? I need to maintain a separate key (under my control) just for that, which will be in the model, but not in the model-model, in temp and not in db itself . I wanna see if there are other options before I start depending on SQL server machinations for this. – mmix Apr 29 '20 at 14:22
  • Then write a tool? Use the standrard approaches ETL in the last 50 years have developped? – TomTom Apr 29 '20 at 14:24