2

I'm using C#.NET 4.0 (Visual Studio 2010), PostgreSQL 9.2, and Npgsql 2.0.12. I can't upgrade to Npgsql 3.

I'm needing to do a fast insert into a parent table and then, using the primary keys from that insert, do a fast insert into a child table.

The parent table has a column defined as "serial" that is the primary key.

The child table has an integer column that is a foreign key back to the parent table.

Not every parent record will have children. A parent can have 0, 1, or many children.

Currently I'm buffering the parent objects to a List. When 5000 parents have been buffered, spawn a new thread from the thread pool to write the records to the database. (A new List is created for the main thread to buffer the next set of parent objects.) The new thread calls NpgsqlConnection.BeginTransaction(), then inside a loop calls NpgsqlCommand.ExecuteScalar() with parameters to insert the parent record and get the primary key back. Then build the the parent's child object if any and save to another List. At the end of the loop commit the transaction of parents. But this methodology is BRUTALLY slow. Anywhere from 3 to 10 seconds to insert the 5000 records. Surely there's a better way.

After the parents are committed, I use BulkCopy described at http://codebetter.com/karlseguin/2009/10/25/postgresql-day-2/ (which uses NpgsqlCopyIn) to insert the child records. This works FANTASTIC. It inserts several thousand child records in less than half a second.

I'd love to use that BulkCopy for the parent records also. But I can't figure out how to get the primary key values back from the bulk insert.

So what's the trick to do a fast insert of parent and child records using C# and Npgsql? The answer is probably out there somewhere, but clearly I'm not using the right search engine parameters.

Thanks very much in advance.

TJH
  • 97
  • 1
  • 2
  • 10

3 Answers3

0

The answer for this kind of scenario is usually something like "hi-lo" key generation. In a nutshell, this means that instead of having the database generate IDs on each and every insert (forcing you do retrieve those IDs), you can preallocate a large number of IDs and specify them while inserting. This means you're setting the ID on each parent yourself rather than leaving it empty (and letting PostgreSQL do it).

Concretely, you would retrieve a batch of IDs from the sequence managing the parent table's ID - see this question and this article for more info. Then, once you have the IDs in your application, you'd bulk insert the parents with these IDs.

Community
  • 1
  • 1
Shay Rojansky
  • 15,357
  • 2
  • 40
  • 69
0

I would write the parents insert script to disk in a text file then run it through a regular command to get back all the parents primary key in one round trip to the database.

0

When you use the serial datatype, Postgres automatically generates and assigns a sequence. This is nice because you can hijack that sequence for other purposes, this one included.

Here is my suggestion.

Presuppose your objects look like this:

public Parent
{
    public long Id { get; set; }
    public string Description { get; set; }
    public List<Child> Children { get; set; }
}

public Child
{
    public long Id { get; set; }
    public long ParentId { get; set; }
    public string Description { get; set; }
}

Have your code assign each Parent an Id, based on the sequence. This should occur in the blink of an eye:

NpgsqlCommand cmd = new NpgsqlCommand("select nextval('schema.foo_id_seq')", conn);
foreach (Parent p in parentList.Where(x => x.Id == null && x.Id == 0))
{
    p.Id = Convert.ToInt64(cmd.ExecuteScalar());
    p.Children.ForEach(x => x.ParentId = p.Id);
}

The Where clause might not have been important if those records don't already exists... just something to think about.

From here, your NpgsqlCopyIn should rock out for both parents and children.

Hambone
  • 15,600
  • 8
  • 46
  • 69
  • Thanks @Hambone, and to everyone who offered input. This solution worked perfectly. I'm quit impressed at how fast the call to ExecuteScalar is when just doing 'select nextval...'. Cheers! – TJH Feb 05 '17 at 19:10