1

I have to parse a big XML file and import (insert/update) its data into various tables with foreign key constraints.

So my first thought was: I create a list of SQL insert/update statements and execute them all at once by using SqlCommand.ExecuteNonQuery().

Another method I found was shown by AMissico: Method where I would execute the sql commands one by one. No one complained, so I think its also a viable practice.

Then I found out about SqlBulkCopy, but it seems that I would have to create a DataTable with the data I want to upload. So, SqlBulkCopy for every table. For this I could create a DataSet.

I think every option supports SqlTransaction. It's approximately 100 - 20000 records per table.

Which option would you prefer and why?

Community
  • 1
  • 1
Siggi
  • 101
  • 1
  • 10

2 Answers2

0

Don't do it from C# unless you have to, it's a huge overhead and SQL can do it so much faster and better by itself

Insert to table from XML file using INSERT INTO SELECT

Tzah Mama
  • 1,547
  • 1
  • 13
  • 25
  • The xml document is actually stored in an xml column. I would read certain parts of the xml document and would have to reference other parts of the document and based on that other parts and so on. Many references. Also in some cases I have to decide based on values of xml elements, where they belong. It's not a simple structure like in your reference. – Siggi Jun 11 '14 at 09:43
  • Well, why can't you save your output xml then? – Tzah Mama Jun 11 '14 at 09:45
  • I don't need the xml structure, I need the data in it. – Siggi Jun 11 '14 at 10:04
0

You say that the XML is already in the database. First, decide whether you want to process it in C# or in T-SQL.

  1. C#: You'll have to send all data back and forth once, but C# is a far better language for complex logic. Depending on what you do it can be orders of magnitude faster.
  2. T-SQL: No need to copy data to the client but you have to live with the capabilities and perf profile of T-SQL.

Depending on your case one might be far faster than the other (not clear which one).

If you want to compute in C#, use a single streaming SELECT to read the data and a single SqlBulkCopy to write it. If your writes are not insert-only, write to a temp table and execute as few DML statements as possible to update the target table(s) (maybe a single MERGE).

If you want to stay in T-SQL minimize the number of statements executed. Use set-based logic.

All of this is simplified/shortened. I left out many considerations because they would be too long for a Stack Overflow answer. Be aware that the best strategy depends on many factors. You can ask follow-up questions in the comments.

usr
  • 168,620
  • 35
  • 240
  • 369
  • I stay with C#. You said, single SELECT. Do you mean single thread for reading or just no sub-queries? Is there a problem to read on multiple threads, sync problems aside? – Siggi Jun 11 '14 at 10:51
  • Ok, I did not make this point clear. What I meant was that you should avoid the anti-pattern of issuing one `SELECT` per row. I see that a lot. Just query all data in one go. You can read on multiple thread as well if you can make it work. You'd need to partition the data somehow, maybe by ID range. PLINQ is very good for these things. – usr Jun 11 '14 at 10:53
  • Ok. That is my plan. Read as much in advance as possible/as it makes sense. – Siggi Jun 11 '14 at 10:58