1

I'm trying to upload XML to SQL Server database, I use Entity Framework. I have an XML of apx. 30MB with 10k records. It's been taking minutes just to upload a small portion of this data. I read in this post - The Best Way to shred XML data into SQL Server database columns - that using schema can drastically increase performance. I have a schema file available in xsd format, how would I go about applying it? Or perhaps there is some other way to improve the performance? The simplified version of my code is below.

public ActionResult XmlLoaded() {
            XDocument root = XDocument.Load("..path..");
            XElement feed = root.Element("feed");
            IEnumerable<XElement> items = feed.Elements("item");

            foreach (XElement item in items)
            {
                XElement sku = item.Element("sku");
                XElement brand = item.Element("brand");

                Product product = new Product();

                product.ProductId = (int)sku;
                product.Brand = (string)brand;

                _productRepo.Add(product);
                _productRepo.SaveChanges();
            }
Community
  • 1
  • 1
Turo
  • 1,537
  • 2
  • 21
  • 42

1 Answers1

3

I'd rather do the whole thing within SQL Server. You did not state the version of SQL Server you are using, but from 2005 up the XML support is there and since 2008 it's really great.

With a code like this you can load the XML in one gone:

Just try it: Paste this into an empty query window, set the right file path and execute. After a while you'll see the XML-result (click to open).

DECLARE @yourXML AS XML=
(
SELECT CONVERT(XML, BulkColumn,2) AS BulkColumn
FROM OPENROWSET(BULK 'PathToFile.xml', SINGLE_BLOB) AS x
);
SELECT @yourXML;

If you poste your XML (part of it), I will show you how to set an INSERT statement which will read, parse and insert your data into SQL table. This is - for sure - faster than any other approach... The full process will take not much more time than the simple read takes. If you try the SQL above you know roughly how long the full process will take.

Shnugo
  • 66,100
  • 9
  • 53
  • 114
  • Hi I tried this, I don't follow this syntax. I saw documentation on Microsoft, but it's not clear to me yet. – Turo Nov 04 '15 at 16:17
  • @Turo, does it work though? If not: Do you get any error message? In short: this `FROM OPENROWSET` allows to select data from a not defined source. The whole file is loaded as a 'SINGLE_BLOB'. After the load process the XML-file is completely taken into the variable `@yourXML` which is of type `XML`. In the last row, the `SELECT @xml` does nothing else then show to you the content. Clear? – Shnugo Nov 04 '15 at 16:21
  • It did work and I get the XML displayed. I'm just a bit sceptical to apply the solution that I cannot trace in docs, and still some fundamentals are missing in my mind. What is the BulkColumn? what is 2 standing for? and what is x ? All of this is not explained in the microsoft docs. Can you recommend some reading? Many thanks btw! – Turo Nov 04 '15 at 16:29
  • @Turo, you may start here (but you've found this yourself probably): https://msdn.microsoft.com/de-de/library/ms190312%28v=sql.120%29.aspx?f=255&MSPPError=-2147217396 Kind of tutorial is this: https://www.mssqltips.com/sqlservertip/1643/using-openrowset-to-read-large-files-into-sql-server/ The "2" is a parameter of `CONVERT` (read here (section "Binary Styles"): https://msdn.microsoft.com/en-us/library/ms187928.aspx The "x" is just because the thing must have a name ;-) – Shnugo Nov 04 '15 at 16:36
  • @Turo, I'm curious: How long does it take to load and display your data? – Shnugo Nov 04 '15 at 16:36
  • Ok :) Thank you Shnugo, I had some fight with this tonight. Finally I got to also parse this as well. It took just few seconds to execute your code, overall with parsing maybe 15-20secs. It is not EF solution, but definitely much faster and no other ideas around - so I gladly accept your answer. – Turo Nov 04 '15 at 22:38
  • @Turo, glad to read about your success! May I ask how you did the parsing? The jump from "few seconds" to "15-20 secs" might not be necessary. If you like you might poste your SELECT. Maybe I have some ideas to speed things up. One hint (don't know about your SQL Server Skills): You may create a Stored Procedure to call this from your application easily. – Shnugo Nov 05 '15 at 08:06
  • sorry for late reply. I did it this way below: My SQL Server skills are not yet there ;)but your post gave me a thought that Linq might not always be desirable, especially if performance is at stake. At this stage it works fine for me. INSERT INTO Products(ProductId,Brand) select a.value(N'(./productId)[1]', N'nvarchar(max)') as [ProductId], a.value(N'(./brand)[1]', 'nvarchar(max)') as [Brand] from @yourXML.nodes('./ feed/item') as r(a); – Turo Nov 11 '15 at 20:40