3

I need to write some code to insert around 3 million rows of data.
At the same time I need to insert the same number of companion rows.

I.e. schema looks like this:

Item
  - Id
  - Title

Property
  - Id
  - FK_Item
  - Value

My first attempt was something vaguely like this:

BaseDataContext db = new BaseDataContext();
foreach (var value in values)
{
    Item i = new Item() { Title = value["title"]};
    ItemProperty ip = new ItemProperty() { Item = i, Value = value["value"]};
    db.Items.InsertOnSubmit(i);
    db.ItemProperties.InsertOnSubmit(ip);
}
db.SubmitChanges();

Obviously this was terribly slow so I'm now using something like this:

BaseDataContext db = new BaseDataContext();
DataTable dt = new DataTable("Item");
dt.Columns.Add("Title", typeof(string));
foreach (var value in values)
{
    DataRow item = dt.NewRow();
    item["Title"] = value["title"];
    dt.Rows.Add(item);
}
using (System.Data.SqlClient.SqlBulkCopy sb = new System.Data.SqlClient.SqlBulkCopy(db.Connection.ConnectionString))
{
    sb.DestinationTableName = "dbo.Item";
    sb.ColumnMappings.Add(new SqlBulkCopyColumnMapping("Title", "Title"));
    sb.WriteToServer(dt);
}

But this doesn't allow me to add the corresponding 'Property' rows.

I'm thinking the best solution might be to add a Stored Procedure like this one that generically lets me do a bulk insert (or at least multiple inserts, but I can probably disable logging in the stored procedure somehow for performance) and then returns the corresponding ids.

Can anyone think of a better (i.e. more succinct, near equal performance) solution?

Community
  • 1
  • 1
Matt Mitchell
  • 40,943
  • 35
  • 118
  • 185

4 Answers4

3

The best way to move that much data into SQL Server is bcp. Assuming that the data starts in some sort of file, you'll need to write a small script to funnel the data into the two tables. Alternately you could use bcp to funnel the data into a single table and then use an SP to INSERT the data into the two tables.

Jeff Hornby
  • 12,948
  • 4
  • 40
  • 61
  • The problem is, I can't "funnel" the second lot of data into a table without having a corresponding id to work with in the first table. I guess one solution is to assign a fake id and later remove this but that doesn't seem a very good way for doing this in the long term. – Matt Mitchell Sep 06 '09 at 13:32
3

To combine the previous best two answers and add in the missing piece for the IDs:

1) Use BCP to Load the data into a temporary "staging" table defined like this

CREATE TABLE stage(Title AS VARCHAR(??), value AS {whatever});

and you'll need the appropriate index for performance later:

CREATE INDEX ix_stage ON stage(Title);

2) Use SQL INSERT to load the Item table:

INSERT INTO Item(Title) SELECT Title FROM stage;

3) Finally load the Property table by joining stage with Item:

INSERT INTO Property(FK_ItemID, Value)
SELECT id, Value
FROM stage
JOIN Item ON Item.Title = stage.Title
RBarryYoung
  • 55,398
  • 14
  • 96
  • 137
  • Sure, but this assumes title is unique (which it may be, but I wouldn't rely on it. However, this is probably the best direction. – Matt Mitchell Sep 06 '09 at 15:12
  • I had to assume something, since you didn't provide the source table's information. You must have some way to uniquely correlate the information: relying on record order is invalid in SQL. – RBarryYoung Sep 06 '09 at 21:18
  • That said, you can instead put an ID on the staging table for the correlation (usually works even with order dependencies), but it gets more complicated. – RBarryYoung Sep 06 '09 at 21:19
  • Yeah I figured that because it was a single transaction I could rely on record order. Assigning the ID or relying on unique title is probably a better way to go though. – Matt Mitchell Sep 09 '09 at 02:00
2

Bulk copy the data into a temporary table, and then call a stored proc that splits the data into the two tables you need to populate.

Joe
  • 41,484
  • 20
  • 104
  • 125
2

You can bulk copy in code as well, using the .NET SqlBulkCopy class.

David Andres
  • 31,351
  • 7
  • 46
  • 36
  • I am using the .NET SqlBulkCopy class, can you point to where I should look? – Matt Mitchell Sep 06 '09 at 15:10
  • 2
    If you have total control over the data, perhaps you can leverage the KeepIdentity option of the SqlBulkCopyOptions parameter of the constructor. This way, you can pull the highest identity value currently available, and work from there. – David Andres Sep 06 '09 at 15:20
  • Thanks David ... this just helped me out. – John Sep 09 '09 at 21:25