8

I'm developing an application that stores images and related metadata. I'm running into issues when performing a certain query using NHibernate. The query is taking prohibitively long (on my machine something like 31 seconds), although the same query takes only a fraction of a second when executed in SQL Server Management Studio.

I've reduced and extraced the problem to a small test application:

Entities:

Tag, consisting of Id (string, the tag value itself)

public class Tag
{
    public virtual string Id { get; set; }
}

Image, consisting of Id (int), Name (string) and Tags (many-to-many, set of Tag instances)

public class Image
{
    private Iesi.Collections.Generic.ISet<Tag> tags = new HashedSet<Tag>();

    public virtual int Id { get; set; }

    public virtual string Name { get; set; }

    public virtual IEnumerable<Tag> Tags
    {
        get { return tags; }
    }

    public virtual void AddTag(Tag tag)
    {
        tags.Add(tag);
    }
}

I'm using "mapping by code" with the following mappings:

public class TagMapping : ClassMapping<Tag>
{
    public TagMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Assigned));
    }
}

public class ImageMapping : ClassMapping<Image>
{
    public ImageMapping()
    {
        Id(x => x.Id, map => map.Generator(Generators.Native));
        Property(x => x.Name);
        Set(x => x.Tags, 
            map => map.Access(Accessor.Field),
            map => map.ManyToMany(m2m => { }));
    }
}

The NHibernate/database configuration looks like this:

  <hibernate-configuration xmlns="urn:nhibernate-configuration-2.2">
    <session-factory>
      <property name="dialect">NHibernate.Dialect.MsSql2008Dialect</property>
      <property name="connection.connection_string_name">PrimaryDatabase</property>
        <property name="format_sql">true</property>
    </session-factory>
  </hibernate-configuration>
  <connectionStrings>
    <add name="PrimaryDatabase" providerName="System.Data.SqlClient" connectionString="Data Source=.\SQLEXPRESS;Initial Catalog=PerfTest;Integrated Security=True" />
  </connectionStrings>

I want to achieve the following query: give me all images where the name contains a specific string or where any tag contains a specific string. To find the latter I use a subquery that gives me the Ids of all images with matching tags. So in the end the search criteria are: the image has a name containing a specific string or its ID is one of those returned by the subquery.

Here's the code that executes the query:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

The test database (DBMS: SQL Server 2008 Express R2) I run this query against was created specifically for this test and does not contain anything else. I have filled it with random data: 10.000 images (table Image), 4.000 tags (table Tag) and roughly 200.000 associations between images and tags (table Tags), ie. each image has about 20 associated tags. The database

The SQL NHibernate claims to use is:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

This looks reasonable given the query I'm creating.

If I run this query using NHibernate the query takes about 30+ seconds (NHibernate.AdoNet.AbstractBatcher - ExecuteReader took 32964 ms) and returns 98 entities.

However, if I execute an equivalent query directly inside Sql Server Management studio:

DECLARE @p0 nvarchar(4000)
DECLARE @p1 nvarchar(4000)

SET @p0 = '%abc%'
SET @p1 = '%abc%'    

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Name like @p0
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p1
        )
    );

The query takes much less than one second (and returns 98 results as well).

Further experiments:

If I only search by name or only by tags, ie.:

var qry = session.QueryOver<Image>()
    .Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
    .List();

or

var qry = session.QueryOver<Image>()
    .Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
    .List();

the queries are fast.

If I don't use like but an exact match in my subquery:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .Where(x => x.Id == term)
    .Select(x => x.Id);

the query is fast, too.

Changing the match mode for the name to Exact doesn't change anything.

When I debug the program and pause while the query is executing the top of the managed call stack looks like:

[Managed to Native Transition]   
System.Data.dll!SNINativeMethodWrapper.SNIReadSync(System.Runtime.InteropServices.SafeHandle pConn, ref System.IntPtr packet, int timeout) + 0x53 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadSni(System.Data.Common.DbAsyncResult asyncResult, System.Data.SqlClient.TdsParserStateObject stateObj) + 0xa3 bytes   
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadNetworkPacket() + 0x24 bytes  
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadBuffer() + 0x1f bytes     
System.Data.dll!System.Data.SqlClient.TdsParserStateObject.ReadByte() + 0x46 bytes   
System.Data.dll!System.Data.SqlClient.TdsParser.Run(System.Data.SqlClient.RunBehavior runBehavior, System.Data.SqlClient.SqlCommand cmdHandler, System.Data.SqlClient.SqlDataReader dataStream, System.Data.SqlClient.BulkCopySimpleResultSet bulkCopyHandler, System.Data.SqlClient.TdsParserStateObject stateObj) + 0x67 bytes     
System.Data.dll!System.Data.SqlClient.SqlDataReader.ConsumeMetaData() + 0x22 bytes   
System.Data.dll!System.Data.SqlClient.SqlDataReader.MetaData.get() + 0x57 bytes  
System.Data.dll!System.Data.SqlClient.SqlCommand.FinishExecuteReader(System.Data.SqlClient.SqlDataReader ds, System.Data.SqlClient.RunBehavior runBehavior, string resetOptionsString) + 0xe1 bytes  
...

So, my questions are:

  • Why does the query so much longer when performed by NHibernate even though the SQL used is the same?
  • How can I get rid of the difference? Is there a setting that can cause this behavior?

I know the query in general isn't the most efficient thing in the world, but what's striking me here is the difference between using NHibernate and manualy querying. There is definitively something strange going on here.

Sorry for the long post, but I wanted to include as much as possible about the issue. Thanks a lot in advance for your help!

Update 1: I've tested the application with NHProf without much added value: NHProf shows that the executed SQL is

SELECT this_.Id   as Id1_0_,
       this_.Name as Name1_0_
FROM   Image this_
WHERE  (this_.Name like '%abc%' /* @p0 */
         or this_.Id in (SELECT this_0_.Id as y0_
                         FROM   Image this_0_
                                inner join Tags tags3_
                                  on this_0_.Id = tags3_.image_key
                                inner join Tag tag1_
                                  on tags3_.elt = tag1_.Id
                         WHERE  tag1_.Id like '%abc%' /* @p1 */))

Which is exactly what I posted before (because that's what NHibernate wrote to its log in the first place).

Here's a screenshot of NHProf Screenshot of NHProf

The warnings are understandable but don't explain the behavior.

Update 2 @surfen sugested to pull the results of the sub query out of the DB first and stick them back into the main query:

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var ids = imagesWithMatchingTag.GetExecutableQueryOver(session).List<int>().ToArray();

var qry = session.QueryOver<Image>()
    .Where(
            Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
            Restrictions.On<Image>(x => x.Id).IsIn(ids))
    .List();

While this does indeed make the main query fast again, I'd rather not take this approach as it does not fit well with the intended usage in the real world application. It is interesting that this is so much faster, though. I'd expect the subquery approach to be equally fast given that it is not dependent on the outer query.

Update 3 This doesn't seem to be related to NHibernate. If I run the query using normal ADO.NET objects I get the same behavior:

var cmdText = @"SELECT this_.Id   as Id1_0_,
                        this_.Name as Name1_0_
                FROM   Image this_
                WHERE  (this_.Name like  @p0 
                            or this_.Id in 
                        (SELECT this_0_.Id as y0_
                        FROM   Image this_0_
                            inner join Tags tags3_
                                on this_0_.Id = tags3_.image_key
                            inner join Tag tag1_
                                on tags3_.elt = tag1_.Id
                        WHERE  tag1_.Id like  @p1 ));";

using (var con = new SqlConnection(ConfigurationManager.ConnectionStrings["PrimaryDatabase"].ConnectionString))
{
    con.Open();
    using (var txn = con.BeginTransaction())
    {
        using (var cmd = new SqlCommand(cmdText, con, txn))
        {
            cmd.CommandTimeout = 120;
            cmd.Parameters.AddWithValue("p0", "%abc%");
            cmd.Parameters.AddWithValue("p1", "%abc%");

            using (var reader = cmd.ExecuteReader())
            {
                while (reader.Read())
                {
                    Console.WriteLine("Match");
                }
            }

        }
        txn.Commit();
    }
}

Update 4

Query-plans (click to zoom):

Slow query Slow plan

Fast query Fast plan

There definitively is a difference in the plan.

Update 5

As it indeed seems that Sql Server treats the subquery as being correlated I tried something different: I moved the criterion related to the name to a subquery by itself:

var term = "abc";
var mode = MatchMode.Anywhere;

var imagesWithMatchingTag = QueryOver.Of<Image>()
    .JoinQueryOver<Tag>(x => x.Tags)
    .WhereRestrictionOn(x => x.Id).IsLike(term, mode)
    .Select(x => x.Id);

var imagesWithMatchingName = QueryOver.Of<Image>()
    .WhereRestrictionOn(x => x.Name).IsLike(term, mode)
    .Select(x => x.Id);

var qry = session.QueryOver<Image>()
    .Where(
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingName) ||        
      Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag) 
    ).List();

Generated SQL:

SELECT
    this_.Id as Id1_0_,
    this_.Name as Name1_0_
FROM
    Image this_
WHERE
    (
        this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            inner join
                Tags tags3_
                    on this_0_.Id=tags3_.image_key
            inner join
                Tag tag1_
                    on tags3_.elt=tag1_.Id
            WHERE
                tag1_.Id like @p0
        )
        or this_.Id in (
            SELECT
                this_0_.Id as y0_
            FROM
                Image this_0_
            WHERE
                this_0_.Name like @p1
        )
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

This seems to break the correlation and as a result the query becomes "fast" again ("fast" as in "acceptable for the moment"). The query time went down from 30+ seconds to ~170ms. Still not a lightweight query, but at least will allow me to continue from here. I know that a "like '%foo%'" will never be super fast. If it comes to the worst I can still move to a specialized search server (Lucene, solr) or real full text searching.

Update 6 I was able to rewrite the query to NOT use subqueries at all:

var qry = session.QueryOver(() => img)
    .Left.JoinQueryOver(x => x.Tags, () => tag)
    .Where(
        Restrictions.Like(Projections.Property(() => img.Name), term, mode) ||
        Restrictions.Like(Projections.Property(() => tag.Id), term, mode))
    .TransformUsing(Transformers.DistinctRootEntity)
    .List();

SQL:

SELECT
    this_.Id as Id1_1_,
    this_.Name as Name1_1_,
    tags3_.image_key as image1_3_,
    tag1_.Id as elt3_,
    tag1_.Id as Id0_0_
FROM
    Image this_
left outer join
    Tags tags3_
        on this_.Id=tags3_.image_key
left outer join
    Tag tag1_
        on tags3_.elt=tag1_.Id
WHERE
    (
        this_.Name like @p0
        or tag1_.Id like @p1
    );
@p0 = '%abc%' [Type: String (4000)], @p1 = '%abc%' [Type: String (4000)]

However, the query performs now slightly worse than the version with subqueries. I'll investigate this further.

Andre Loker
  • 8,368
  • 1
  • 23
  • 36
  • Possible duplicate: http://dba.stackexchange.com/q/9167/724 – Rowland Shaw Mar 30 '12 at 13:11
  • @RowlandShaw not a duplicate as the issue could be in the C# side - which also makes it out of scope for dba – mmmmmm Mar 30 '12 at 13:14
  • 1
    @AndreLoker - have you run a profiler on this to see where the time is taken – mmmmmm Mar 30 '12 at 13:15
  • 3
    Did you check the query using SQL Profiler? There you should see the actual query and be able to pretty much cut'n'paste it into SQL Manager for comparison. I'm not sure where you're getting the "NHibernate claims to use" from, so unsure if it's the absolute truth :) – Joachim Isaksson Mar 30 '12 at 13:16
  • Install NHPRof and look at the actual query being generated, the post that here. ta – Rippo Mar 30 '12 at 13:27
  • I agree with @Joachim - You never know what SQL a ORM tool decides to use (and they usually generate pretty awful SQL to begin with). – N West Mar 30 '12 at 13:27
  • Ditto SQL profiler, but also compare the connection settings too in case there's something odd on NHibernate's connection. I can't see why else running identical SQL would go wrong :-/ – Rup Mar 30 '12 at 13:40
  • I checked against NHProf without much more insight, see update 1 in my question. I don't have SQL Profiler at hand. – Andre Loker Mar 30 '12 at 13:49
  • It's not in the standard client tools install that you installed to get Management Studio? I thought it was but could well be wrong! – Rup Mar 30 '12 at 13:55
  • Not in the Express version I'm afraid. – Andre Loker Mar 30 '12 at 13:56
  • The issue doesn't seem to be related to NH (see latest update), so I removed the NHibernate tag. – Andre Loker Mar 30 '12 at 13:59
  • Please post the execution plans for both the slow and fast versions. – Martin Smith Mar 30 '12 at 14:02
  • It might be some optimization flags that NH sets (or should set), so IMHO the NH tag is still applicable. – surfen Mar 30 '12 at 14:03
  • What's the best way to post the exec plan? Screenshot? XML? SHOWPLAN table? – Andre Loker Mar 30 '12 at 14:09
  • NHPRof can also show the query plan – Rippo Mar 30 '12 at 14:19
  • I've added the query plans. The plan of the slow version comes from NHProf, the plan for the fast version comes from Management Studio. – Andre Loker Mar 30 '12 at 14:37
  • BTW, in the meantime I have created the suggested missing index, but without any effect on the slow query. – Andre Loker Mar 30 '12 at 14:40
  • I've added new query plans, both from Management Studio (I found a way to capture the plan the the application uses). The sub tree related to matching the tags does differ indeed. The question is - how can I influence the query plan inside my app? – Andre Loker Mar 30 '12 at 15:22
  • 1
    Have you red this http://ayende.com/blog/2322/orm-and-when-query-plans-go-bad and this http://www.codewrecks.com/blog/index.php/2011/07/23/use-sql-server-query-hints-with-nhibernate-hql-and-icriteria/ ? Not sure if it will help you. – Rippo Mar 31 '12 at 05:25
  • @Rippo that sounds very interesting, maybe I can use this in this case. Thanks a lot! – Andre Loker Mar 31 '12 at 08:14

2 Answers2

2

My bet is that it's the second query that is slow:

var qry = session.QueryOver<Image>()
.Where( Restrictions.On<Image>(x => x.Name).IsLike(term, mode) ||
        Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

You provided SQL for only the first query. What about the second? Did you test it under SQL Management Studio? Use SQL Server Profiler as @JoachimIsaksson suggests to find out what queries exacly NHibernate executes server-side.

This looks like you're loading 97 image objects into memory. How large is each of them?

EDIT

Another bet is that your first query executes ad inner query for the second query. Try doing .List() on the first query to load the tags into memory.

EDIT 2

From the query plans it really looks like your query is being called as a Correlated subquery. You mentioned that these queries are fast:

var qry = session.QueryOver<Image>()
.Where( Subqueries.WhereProperty<Image>(x => x.Id).In(imagesWithMatchingTag))
.List();

or

var qry = session.QueryOver<Image>()
.Where(Restrictions.On<Image>(x => x.Name).IsLike(term, mode))
.List();

Just UNION them and you should get the same result as running both of them separately. Also make sure that all the join columns have indexes.

That's the catch with IS IN (query) - you can't be sure how the database executes it (unless you somehow force it to use a certain plan). Maybe you could change .In() it into JoinQueryOver() somehow?

surfen
  • 4,644
  • 3
  • 34
  • 46
  • There is only one query. `imagesWithMatchingTag` is a subquery. The SQL I posted is the all that is executed. If you look at my question, you'll see that `Image` objects are rather small: two primitive fields and one hash set with at most 20 entries. – Andre Loker Mar 30 '12 at 13:35
  • I see. See my edits. Try doing .List() on the first query to see if it helps. – surfen Mar 30 '12 at 13:36
  • Wouldn't that pull the first query out of the SQL and instead make it pass the list of IDs to and from the code? Won't that be worse? – Rup Mar 30 '12 at 13:39
  • Not if currently it runs as corellated sub-query. I should have added that it's my assumption there. – surfen Mar 30 '12 at 13:41
  • @surfen Retrieving the IDs out of the database first does indeed make the query go fast. It isn't really a suitable solution for the real world application. I don't want to put hundreds of parameters into the query. If I look at the SQL that NHibernate generates the subquery is not correlated. – Andre Loker Mar 30 '12 at 13:48
  • My first priority is not to improve the query design but how to explain (and remove) the difference between the execution speed when run through NHibernate vs. Management Studio. – Andre Loker Mar 30 '12 at 13:49
  • @AndreLoker Provide the query execution plan for the NHibernate query in your question. This will show which part of the query takes so long. If you had SQL Server Profiler, it would be possible to compare these query execution plans. – surfen Mar 30 '12 at 13:55
  • As per design, It's easier to maintain more simple queries than one complex one. And as this example shows, one complex query isn't always better than several small queries. If you really don't want to execute 2 queries you could perform all this logic on the server side using stored procedures. – surfen Mar 30 '12 at 16:00
  • Thanks already for your response. I can't do a (server side) union in NHibernate. The problem is that the query I showed is itself part of a bigger query used to fetch Images according to filtercriteria. The bigger query also needs to check access rights and needs to perform paging. So ideally at this point I want one query that I can stick (as a subquery itself) into my bigger query. But here's an interesting finding: if I move the part of the query that compares the name to a subquery itself (see updated question) the query is "fast" again. I'll update my question with more details. – Andre Loker Mar 30 '12 at 18:58
  • I'm glad I could help. I'm not really an expert in NHibernate so I suppose maybe someone else will propose a better approach. I hope you'll find satisfying solution. – surfen Mar 30 '12 at 19:06
  • @surfen I'll accept this answer. While I don't understand yet *why* SQL Server treats my original query as correlated, you answer and comments made me try alternative queries. Thanks for your help. – Andre Loker Apr 04 '12 at 14:39
0

Have you check for Parameter Sniffing Issue? I saw you did not mention data type when adding parameter to a Sql Command. Please See SQL Server - parameter sniffing