2

I am creating a website in ASP.NET MVC and use NHibernate as ORM. I have the following tables in my database:

  • Bookmarks
  • TagsBookmarks (junction table)
  • Tags

Mapping:

    public BookmarkMap()
    {
        Table("Bookmarks");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Link);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Tags).AsSet().Cascade.None().Table("TagsBookmarks").ParentKeyColumn("BookmarkId")
        .ChildKeyColumn("TagId");
    }

    public TagMap()
    {
        Table("Tags");
        Id(x => x.Id).Column("Id").GeneratedBy.Identity();
        Map(x => x.Title);
        Map(x => x.Description);
        Map(x => x.DateCreated);
        Map(x => x.DateModified);
        References(x => x.User, "UserId");
        HasManyToMany(x => x.Bookmarks).AsSet().Cascade.None().Inverse().Table("TagsBookmarks").ParentKeyColumn("TagId")
        .ChildKeyColumn("BookmarkId");
    }

I need the data from both the Bookmarks and Tags table. More specific: I need 20 bookmarks with their related tags. The first thing I do is select 20 bookmark ids from the Bookmarks table. I do this because paging doesn't work well on a cartesian product that I get in the second query.

First query:

IEnumerable<int> bookmarkIds = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>()
                                where b.User.Username == username
                                orderby b.DateCreated descending
                                select b.Id).Skip((page - 1) * pageSize).Take(pageSize).ToList<int>();

After that I select the bookmarks for these ids.

Second query:

IEnumerable<Bookmark> bookmarks = (from b in SessionFactory.GetCurrentSession().Query<Bookmark>().Fetch(t => t.Tags)
                                   where b.User.Username == username && bookmarkIds.Contains(b.Id)
                                   orderby b.DateCreated descending
                                   select b);

The reason I use fetch is because I want to avoid N+1 queries. This works but results in a cartesian product. I have read in some posts that you should avoid cartesian products, but I don't really know how to do this in my case.

I have also read something about setting a batch size for the N+1 queries. Is this really faster than this single query?

An user can add max 5 tags to a bookmark. I select 20 bookmarks per page so worst case scenario for this second query is: 5 * 20 = 100 rows.

Will this impact performance when I have lots of data in the Bookmarks and Tags tables? Should I do this differently?

2 Answers2

1

This is not a Cartesian product.

~ Figure A ~

Bookmarks -> Tags -> Tag

A Cartesian product is all of the possible combinations of two different sets. For example, suppose we had three tables: Customer, CustomerAddress, and CustomerEmail. Customers have many addresses, and they also have many email addresses.

~ Figure B ~

Customers -> Addresses
          -> Emails

If you wrote a query like...

select *
from
    Customer c
    left outer join CustomerAddress a
        on c.Id = a.Customer_id
    left outer join CustomerEmail e
        on c.Id = e.Customer_id
where c.Id = 12345

... and this customer had 5 addresses and 5 email addresses, you would wind up with 5 * 5 = 25 rows returned. You can see why this would be bad for performance. It is unnecessary data. Knowing every possible combination of Address and Email Address for a customer tells us nothing useful.

With your query, you are not returning any unnecessary rows. Every row in the result set corresponds directly to a row in one of the tables you're interested in, and vice-versa. There is no multiplication. Instead you have TagsBookmarksCount + BookmarksThatDontHaveTagsCount.

The key place to look for Cartesian products is when your query branches off into two separate unrelated collections. If you're just digging deeper and deeper into a single chain of child collections, as in Figure A, there is no Cartesian product. The number of rows your query returns will be limited by the number of rows returned by that deepest collection. As soon as you branch off to the side so that you now have two parallel, side-by-side collections in the query, as in Figure B, then you have a Cartesian product, and results will be unnecessarily multiplied.

To fix a Cartesian product, split the query into multiple queries so the number of rows are added, not multiplied. With NHibernate's Future methods, you can batch those separate queries together, so you still only have one round trip to the database. See one of my other answers for an example of how to fix a Cartesian product in NHibernate.

Community
  • 1
  • 1
Daniel Schilling
  • 4,829
  • 28
  • 60
  • So if I understand you correctly I do not have a cartesian product? If so, does the link you provided may help fixing this? Thank you very much for your answer! –  Nov 07 '13 at 21:23
  • Correct. You do not have a Cartesian product. Some other time in the future, if you do end up having a Cartesian product you need to deal with, that link gives an example of how to fix it. – Daniel Schilling Nov 07 '13 at 21:42
  • Oke great. I have one last question :) I am probably overly concerned but will my query perform well or should it be done differently? –  Nov 07 '13 at 21:55
  • I think it's a very nice query. Eager fetching, paging, no Cartesian prods. The only thing I see that could possibly be improved is how you have two round trips to the database. You could try combining this into one query. Don't evaluate `bookmarkIds` using `ToList()`, instead leave it as an `IQueryable`. That *should* translate to a subquery on the SQL side. It would be a more complex query, and it's possible that the additional complexity would make it harder for the database to execute it efficiently. You'd just have to test the two different versions of the method to see which was faster. – Daniel Schilling Nov 07 '13 at 22:07
0

Query<>.Fetch() is intended to ensure that eager loading is taking place, and when it's a one-to-many relationship, as this appears to be ( i.e. if Bookmark.Tags is a collection) then the two ways you are going about this are roughly equivalent. If Tags is lazy-loaded and only accessed rarely, then leaving it non-fetched may be the best way to go (as in your first query), because you will not always be accessing the Tags much. This depends on use case.

If, on the other hand, you know that you will always be getting all the tags, it may make more sense to break this off into another query, this time on the whatever the Tags type/table is, and look them up instead of using the NHibernate relations to do the job.

If Tag has a foreign key to Bookmarks, like BookmarkId, ToLookup can be useful in this case:

var tagLookup = (from t in SessionFactory.GetCurrentSession().Query<Tag>()
                 // limit query appropriately for all the bookmarks you need
                 // this should be done once, in this optimization
                 select new {key=t.BookmarkId, value=t} )
                 .ToLookup(x=>x.key, x=>x.value);

Will give you a lookup (ILookup<int, Tag>) where you can do something like:

IGrouping<Tag> thisBookmarksTags = tagLookup[bookmarkId];

Which will give you the tags you need for that bookmark. This separates it out into another query, thereby avoiding N+1.

This is making quite a few assumptions about your data model, and the mappings, but I hope it illustrates a pretty straight-forward optimization that you can use.

dwerner
  • 6,462
  • 4
  • 30
  • 44
  • Thank you for your answer. However I have a many to many relation between bookmarks and tags. This is mapped like this: HasManyToMany(x => x.Tags) so I do not have a bookmark id inside Tag. –  Nov 07 '13 at 20:38
  • So you have something like a BookmarkTags relationship table? Just query that then, building the lookup in the same way. – dwerner Nov 07 '13 at 20:40
  • So I need to map the junction table? –  Nov 07 '13 at 20:46
  • That's one approach, if you want to build a lookup to optimize for your situation. Keep in mind I'm just suggesting an approach to optimization, and I don't know the other factors in your situation. That said, I have regularly mapped relationship tables, because they are useful from this perspective. If you think of your tables as a graph of relations, you can picture eager fetching with `Fetch` as one way to embolden just the parts of the picture you need in a given query, or set of queries. – dwerner Nov 07 '13 at 20:56
  • I also make sure to map the ids of any relation, so that I am not in a situation where I MUST load that entity to determine another relationship. With NHibernate, this will then affect your update strategy, so I guess it's all my opinion on how to configure things. – dwerner Nov 07 '13 at 20:57
  • Oke nice this makes sense. Do you know if the query I use now may impact performance and when? It is not yet online and I have no idea how many people are going to use it :) –  Nov 07 '13 at 21:03
  • As it stands, it's either going to be N+1 at worst in the case of lazy loading (performance hit upon accessing the lazy loaded member), or N+1 at best in the case of the fetch (performance hit at time of query). – dwerner Nov 07 '13 at 21:09