4

I have a reasonably complex query to extract the Id field of the results I am interested in based on parameters entered by the user.

After extracting the relevant Ids I am using the resulting set of Ids several times, in separate queries, to extract the actual output record sets I want (by joining to other tables, using aggregate functions, etc).

I would like to avoid running the initial query separately for every set of results I want to return. I imagine my situation is a common pattern so I am interested in what the best approach is.

The database is in MS SQL Server and I am using .NET 3.5.

Flash
  • 15,945
  • 13
  • 70
  • 98
  • What I have understood so far is, I assume you have a stored proc, you extract some IDs and send it back to Front End and based on these IDs you have to fetch some another information and you are repeating the first step every time and you want to reuse the first step. Right? – Pankaj Mar 23 '12 at 19:38
  • I don't have a stored proc. I am running a series of queries separately from the front-end each of which uses a complex subquery, which is the same for all the queries. – Flash Mar 24 '12 at 05:06
  • Are all these series of queries being executed in the single request going from Front End to DB to Front End? – Pankaj Mar 24 '12 at 17:49
  • @Penkaj No, separate requests. – Flash Mar 26 '12 at 06:39

8 Answers8

1

If you are planning to cache the result set in your application code, then ASP.NET has cache, Your Winform will have the object holding the data with it with which you can reuse the data.

If planning to do the same in SQL Server, you might consider using indexed views to find out the Id's. The view will be materialized and hence you can get the results faster. You might even consider using a staging table to hold the id's temporarily.

1

It would definitely help if the question contained some measurements of the unoptimized solution (data sizes, timings). There is a variety of techniques that could be considered here, some listed in the other answers. I will assume that the reason why you do not want to run the same query repeatedly is performance.

If all the uses of the set of cached IDs consist of joins of the whole set to additional tables, the solution should definitely not involve caching the set of IDs outside of the database. Data should not travel there and back again if you can avoid it.

In some cases (when cursors or extremely complex SQL are not involved) it may be best (even if counterintuitive) to perform no caching and simply join the repetitive SQL to all desired queries. After all, each query needs to be traversed based on one of the joined tables and then the performance depends to a large degree on availability of indexes necessary to join and evaluate all the remaining information quickly.

The most intuitive approach to "caching" the set of IDs within the database is a temporary table (if named #something, it is private to the connection and therefore usable by parallel independent clients; or it can be named ##something and be global). If the table is going to have many records, indexes are necessary. For optimum performance, the index should be a clustered index (only one per table allowed), or be only created after constructing that set, where index creation is slightly faster.

Indexed views are cleary preferable to temporary tables except when the underlying data is read only during the whole process or when you can and want to ignore such updates to keep the whole set of reports consistent as far as the set goes. However, the ability of indexed views to always accurately project the underlying data comes at a cost of slowing down those updates.

One other answer to this question mentions stored procedures. This is largely a way of organizing your code. However, it if you go this way, it is preferable to avoid using temporary tables, because such references to a temporary table prevent pre-compilation of the stored procedure; go for views or indexed views if you can.

Regardless of the approach you choose, do not guess at the performance characteristics and query optimizer behavior. Learn to display query execution plans (within SQL Server Management Studio) and make sure that you see index accesses as opposed to nested loops combining multiple large sets of data; only add indexes that demonstrably and drastically change the performance of your queries. A well chosen index can often change the performance of a query by a factor of 1000, so this is somewhat complex to learn but crucial for success.

And last but not least, make sure you use UPDATE STATISTICS when repopulating the database (and nightly in production), or your query optimizer will not be able to put the indexes you have created to their best uses.

Jirka Hanika
  • 13,301
  • 3
  • 46
  • 75
  • Thanks for the thorough answer. At the moment I have repeated the subquery in each query with a `JOIN` as you suggest. It may end up being easier and faster to do this than anything else I think. It just feels like a violation of DRY and a waste of processing. Each query may be better optimised this way though like you said - and it seems DBMS are notorious for giving unexpected performace characteristics anyway. – Flash Mar 22 '12 at 09:21
  • @Andrew - If DRY is the goal, the common part may be encapsulated into an ordinary VIEW. That has no (immediate) impact on performance either way, although with excessive use of views, irrelevant tables do tend to creep into the queries, making the query optimizer's job more difficult. So I would only recommend such encapsulation for JOINs that are big, convoluted, frequently modified, or expected to be called from more and more places. – Jirka Hanika Mar 22 '12 at 09:58
1

With SQL Server 2008 you can pass table variables as params to SQL. Just cache the IDs and then pass them as a table variable to the queries that fetch the data. The only caveat of this approach is that you have to predefine the table type as UDT.

http://msdn.microsoft.com/en-us/library/bb510489.aspx

AleksG
  • 26
  • 1
0

For SQL Server, Microsoft generally recommends using stored procedures whenever practical.

Here are a few of the advantages:

http://blog.sqlauthority.com/2007/04/13/sql-server-stored-procedures-advantages-and-best-advantage/

* Execution plan retention and reuse
* Query auto-parameterization
* Encapsulation of business rules and policies
* Application modularization
* Sharing of application logic between applications
* Access to database objects that is both secure and uniform
* Consistent, safe data modification
* Network bandwidth conservation
* Support for automatic execution at system start-up
* Enhanced hardware and software capabilities
* Improved security
* Reduced development cost and increased reliability
* Centralized security, administration, and maintenance for common routines

It's also worth noting that, unlike other RDBMS vendors (like Oracle, for example), MSSQL automatically caches all execution plans:

http://msdn.microsoft.com/en-us/library/ms973918.aspx

However, for the last couple of versions of SQL Server, execution plans are cached for all T-SQL batches, regardless of whether or not they are in a stored procedure

paulsm4
  • 114,292
  • 17
  • 138
  • 190
  • How does using stored procedures help cache the result of my subquery? – Flash Jan 19 '12 at 23:03
  • I'm guessing this answer was in response to the question title, which says "caching initial SELECT statement for reuse", rather than caching the *results* of that query. – RickNZ Jan 20 '12 at 00:29
0

The best approach depends on how often the Id changes, or how often you want to look it up again.

One technique is to simply store the result in the ASP.NET object cache, using the Cache object (also accessible from HttpRuntime.Cache). For example (from a page):

this.Cache["key"] = "value";

There are many possible variations on this theme.

RickNZ
  • 18,448
  • 3
  • 51
  • 66
  • Can I then re-use the cached recordset as a table for future queries? – Flash Jan 19 '12 at 23:04
  • Provided you use a detached data store, such as a DataTable, you can re-use it as a source of info for future queries, as long as you only read it and don't write it. – RickNZ Jan 20 '12 at 00:24
  • Would you say this is better than using table variables as in http://odetocode.com/code/365.aspx? – Flash Jan 20 '12 at 00:54
  • I depends what you're trying to do. Table variables are a good choice if you need to store a multi-row result from one query to another as part of a stored procedure. The ASP.NET object cache is a good choice if you want to carry the results over from one page request to another. You can use a DataTable alone to carry the results from one query to another within the same page request on the web server side. – RickNZ Jan 20 '12 at 03:32
0

You can use Memcached to cache values in the memory. As I see there are some .net ports.

Community
  • 1
  • 1
ceth
  • 44,198
  • 62
  • 180
  • 289
  • Memcached or Velocity might be useful in a multi-server, load-balanced environment. For a typical single-server app, though, they're overkill and slow compared to an in-process solution. – RickNZ Jan 20 '12 at 00:26
0

How frequently does the data change that you'll be querying? To me, this sounds like a perfect scenario for data warehousing, where you flatting the data for quicker data retrieval and create the tables exactly as your 'DTO' wants to see the data. This method is different than an indexed view in that it's simply a table which will have quick seek operations, and could especially be improved if you setup the indexes properly on the columns that you plan to query

Anthony Shaw
  • 8,146
  • 4
  • 44
  • 62
0

You can create Global temporary Table. Create the table on the fly. Now insert the records as per your request. Access this table in your next request in your joins... for reusability

Pankaj
  • 9,749
  • 32
  • 139
  • 283