4

Using NHibernate 2.1.2.4000 against SQL Server 2008. The target table has no triggers or extraneous indexes. It is simply:

create table LogEntries (
    Id INT IDENTITY NOT NULL,
   HostName NVARCHAR(32) not null,
   UserName NVARCHAR(64) not null,
   LogName NVARCHAR(512) not null,
   Timestamp DATETIME not null,
   Level INT not null,
   Thread NVARCHAR(64) not null,
   Message NVARCHAR(MAX) not null,
   primary key (Id)
)

My entity mapping is:

<class name="LogEntry" table="LogEntries">
    <id name="Id" unsaved-value="0">
        <generator class="native"/>
    </id>
    <property name="HostName" length="32" not-null="true"/>
    <property name="UserName" length="64" not-null="true"/>
    <property name="LogName" length="512" not-null="true"/>
    <property name="Timestamp" type="utcdatetime" not-null="true"/>
    <property name="Level" not-null="true"/>
    <property name="Thread" length="64" not-null="true"/>
    <property name="Message">
        <column name="Message" sql-type="NVARCHAR(MAX)" not-null="true"/>
    </property>
</class>

Now, consider the following test case:

[Fact]
public void bulk_insert_test()
{
    var batchSize = 100;
    var numberItems = 10000;

    var configuration = new NHibernate.Cfg.Configuration().Configure();
    configuration.SetProperty("connection.connection_string", @"my_conn_string");
    configuration.SetProperty("adonet.batch_size", batchSize.ToString());
    var sessionFactory = configuration.BuildSessionFactory();

    var ts = this.WriteWithNH(sessionFactory, numberItems);
    ////var ts = this.WriteWithBC(sessionFactory, numberItems, batchSize);

    Console.WriteLine("Saving {0} items with batch size {1}: {2}", numberItems, batchSize, ts);
}

public TimeSpan WriteWithNH(ISessionFactory sessionFactory, int numberItems)
{
    using (var session = sessionFactory.OpenStatelessSession())
    using (var transaction = session.BeginTransaction())
    {
        session.Insert(new LogEntry()
        {
            HostName = "host",
            UserName = "user",
            LogName = "log",
            Level = 0,
            Thread = "thread",
            Timestamp = DateTime.UtcNow,
            Message = "Warm up"
        });

        transaction.Commit();
    }

    var sw = Stopwatch.StartNew();

    using (var session = sessionFactory.OpenStatelessSession())
    using (var transaction = session.BeginTransaction())
    {
        for (var i = 0; i < numberItems; ++i)
        {
            session.Insert(new LogEntry()
            {
                HostName = "host",
                UserName = "user",
                LogName = "log",
                Level = 0,
                Thread = "thread",
                Timestamp = DateTime.UtcNow,
                Message = "Message " + i
            });
        }

        transaction.Commit();
    }

    return sw.Elapsed;
}

public TimeSpan WriteWithBC(ISessionFactory sessionFactory, int numberItems, int batchSize)
{
    using (var session = sessionFactory.OpenStatelessSession())
    using (var bulkCopy = new SqlBulkCopy((SqlConnection)session.Connection))
    {
        bulkCopy.BatchSize = batchSize;
        bulkCopy.DestinationTableName = "LogEntries";
        var table = new DataTable("LogEntries");
        table.Columns.Add("Id", typeof(int));
        table.Columns.Add("HostName", typeof(string));
        table.Columns.Add("UserName", typeof(string));
        table.Columns.Add("LogName", typeof(string));
        table.Columns.Add("Timestamp", typeof(DateTime));
        table.Columns.Add("Level", typeof(int));
        table.Columns.Add("Thread", typeof(string));
        table.Columns.Add("Message", typeof(string));

        var row = table.NewRow();
        row["HostName"] = "host";
        row["UserName"] = "user";
        row["LogName"] = "log";
        row["Timestamp"] = DateTime.UtcNow;
        row["Level"] = 0L;
        row["Thread"] = "thread";
        row["Message"] = "Warm up";
        table.Rows.Add(row);

        bulkCopy.WriteToServer(table);
    }

    var sw = Stopwatch.StartNew();

    using (var session = sessionFactory.OpenStatelessSession())
    using (var bulkCopy = new SqlBulkCopy((SqlConnection)session.Connection))
    {
        bulkCopy.BatchSize = batchSize;
        bulkCopy.DestinationTableName = "LogEntries";
        var table = new DataTable("LogEntries");
        table.Columns.Add("Id", typeof(int));
        table.Columns.Add("HostName", typeof(string));
        table.Columns.Add("UserName", typeof(string));
        table.Columns.Add("LogName", typeof(string));
        table.Columns.Add("Timestamp", typeof(DateTime));
        table.Columns.Add("Level", typeof(int));
        table.Columns.Add("Thread", typeof(string));
        table.Columns.Add("Message", typeof(string));

        for (var i = 0; i < numberItems; ++i)
        {
            var row = table.NewRow();
            row["HostName"] = "host";
            row["UserName"] = "user";
            row["LogName"] = "log";
            row["Timestamp"] = DateTime.UtcNow;
            row["Level"] = 0;
            row["Thread"] = "thread";
            row["Message"] = "Message " + i;
            table.Rows.Add(row);
        }

        bulkCopy.WriteToServer(table);
    }

    return sw.Elapsed;
}

Here is some sample output when using NHibernate to perform the inserts:

Saving 10000 items with batch size 500: 00:00:12.3064936
Saving 10000 items with batch size 100: 00:00:12.3600981
Saving 10000 items with batch size 1: 00:00:12.8102670

As a point of comparison, you'll see I also implemented a BCP-based solution. Here is some sample output:

Saving 10000 items with batch size 500: 00:00:00.3142613
Saving 10000 items with batch size 100: 00:00:00.6757417
Saving 10000 items with batch size 1: 00:00:26.2509605

Clearly, the BCP solution is miles faster than the NH one. Also evident is that batching is affecting the speed of the BCP solution, but not the NH one. When using NHibernate to do inserts, NHProf shows the following:

alt text http://img9.imageshack.us/img9/8407/screenshotac.png

There are only INSERTs, no SELECTs. Interestingly, at no point does NHProf give me this warning.

I have tried specifying adonet.batch_size in both my config file and in code as per the test case above.

Now, I'm not expecting the NH solution to ever reach the speed of the BCP solution, but I'd at least like to know why batching isn't working. If it's good enough with batching enabled, then I may use the NH solution over the BCP just to keep the code base simpler.

Can anyone explain why NH is refusing to honor ADO.NET batching, and what I can do to fix it? All the scattered NH "documentation" I've read states that all you need to do is specify adonet.batch_size and (preferably) use a stateless session, but I'm doing both those things.

Thanks

Kent Boogaart
  • 175,602
  • 35
  • 392
  • 393
  • @Kent: Have you had a look at what's happening under the hood with NHProf? are you certain that the insert operations aren't being batched? – DanP Aug 04 '10 at 15:56
  • Yes, I've looked with NHProf and there is no difference between when running with/without batching. – Kent Boogaart Aug 04 '10 at 16:23
  • So the statements aren't being grouped into batches? A few thoughts: 1) Can you post your full config file? 2) Have you tried specifying the Nhibernate batch size directly in the Nhibernate config file rather than overriding in your test fixture? – DanP Aug 04 '10 at 16:25
  • Also, is your primary key mapped as native? Do you see the select statement for each new id after each insert operation? I wonder if that could be the cause of no batching occuring? – DanP Aug 04 '10 at 16:38
  • @DanP: I've updated my post to answer all your questions. – Kent Boogaart Aug 05 '10 at 08:44
  • Wow..that's pretty...unexpected; I'm guessing Diego has hit the nail on the head though; apparently this doesn't work as one would expect when using identity... – DanP Aug 05 '10 at 13:02

2 Answers2

12

Using identity breaks batching.

Explained by Fabio Maulo here

The best alternative is switching to a different generator (I always recommend hilo or guid.comb)

Diego Mijelshon
  • 52,548
  • 16
  • 116
  • 154
  • Thanks Diego, but Fabio's post only talks about using `ISession`, not `IStatelessSession`. As per my question, I am using the latter. There is no need to do the subsequent `SELECT` to obtain the ID, and nor does that happen. Therefore, it should be possible to batch the `INSERT`s. – Kent Boogaart Aug 05 '10 at 08:44
  • Also, section 17.6 of the NH documentation (http://www.nhforge.org/doc/nh/en/index.html#performance-batch-updates) lists limitations of batching, and it doesn't mention ID generators at all. Par for the course with NH, I know - but so incredibly frustrating. – Kent Boogaart Aug 05 '10 at 08:48
  • Kent, I don't think `IStatelessSession` supports batching at all. Have you tried using a regular `ISession`? In some cases, it performs better, even for batch scenarios. – Diego Mijelshon Aug 05 '10 at 13:45
  • it does, but I don't blame you for being unsure. I think the crux of the problem is the awful state of NH documentation. It seems that even with a stateless session, NH is still insisting on updating the entity ID, which precludes batching. I can't find a citation, least of all in the NH documentation, but I'm marking yours as answer. – Kent Boogaart Aug 05 '10 at 14:31
  • By the way, I meant to point out that I mistaken about saying that the `SELECT` was not happening, in case that wasn't obvious. – Kent Boogaart Aug 05 '10 at 14:40
  • @KentBoogaart, `IStatelessSession.Insert` signature (and stated in xml documentation) causes it to yield that id as returned value. Many people use `ISession.Save` and `IStatelessSession.Insert` without caring about they return values, but they have one and it is the id. – Frédéric Jun 13 '17 at 11:50
3

Neither ISession or IStatelessSession will be able to batch inserts if you are using identity as primary key.

When Insert is made Nhibernate will put the correct value in your Id property. But when you are using Identity the only place where this Id can be taken is the data base. Use hilo to make batch inserts.

Community
  • 1
  • 1
Sly
  • 15,046
  • 12
  • 60
  • 89
  • 1
    +1 That is the general consensus, but I was hoping for a citation. My use of a stateless session to me should be enough to forgo updating the entity ID. Failing that, a means for me to express my disinterest in the entity ID would be useful because then this would enable batching to occur. Perhaps this is a case of wrong tool for the job, but I was hoping to at least trial it. – Kent Boogaart Aug 05 '10 at 14:29