1

I'm implementing a user search function in SQL/C#. I'm having a bit of trouble with the logistics of the function itself and I'm looking for some guidance.

I've been using a triple-nested subquery, which I feel is running a bit slow (avg 280-300ms to pull results from 1000 records in a Users table).

I first want to search by attributes for users (a separate table), then search based on location. I then want distance calculated and sorted, then only the closest 10 records brought back at a time (paged results).

Am I using the correct approach using a triple nested subquery? Or is there a standard or guiding principle to do this sort (pun not intended) kind of thing?

Code sample:

SELECT * FROM
    (SELECT *, ROW_NUMBER()                 
        OVER (ORDER BY UsersSubquery.Distance ASC) as RowNumber  
        FROM

    (

        SELECT TOP 100 PERCENT UsersSubquery.*, 

         (((geography::Point(UsersLocationsSubquery.Latitude, UsersLocationsSubquery.Longitude, 4326)).STDistance(@a)) / 1000) AS Distance      

        FROM Users UsersSubquery 

            INNER JOIN UsersAndAttributes ON UsersSubquery.UserId = UserAndAttributes.UserId

            INNER JOIN UsersAndLocations AS UsersWithLocationsSubquery  ON UsersSubquery.UserId = UsersWithLocationsSubquery.UserId

        WHERE 
        ( 

            ((@Attribute1 = NULL) OR (UsersSubquery.Attribute1Id = @Attribute1)) 
            AND
            ((@Attribute2 = NULL) OR (UsersSubquery.Attribute2Id = @Attribute2)) 
            AND
            ((@Attribute3 = NULL) OR (UsersSubquery.Attribute3Id = @Attribute3)) 
            AND
            ...etc

        )


    )

    AS UsersSubquery )

Users

    INNER JOIN Pictures ON Users.UserId = Pictures.UserId

WHERE RowNumber >= @StartRow and RowNumber <= @EndRow 

Order by RowNumber
user111222333
  • 99
  • 1
  • 11
  • Can you show us the query you're using? – Mike D. Aug 12 '15 at 23:55
  • This is too vague to get any useful answers. We don't know the structure of your table, how it is indexed, what your query is, or much of anything else. Please post some code or we can't help you. – elixenide Aug 12 '15 at 23:56
  • Yes, I thought this would come up. Momentarily... – user111222333 Aug 12 '15 at 23:57
  • Perhaps I should ask, is a temporary View superior to multiple subqueries? – user111222333 Aug 12 '15 at 23:59
  • Code sample above. It just feels like a mess of subqueries. The inner most subquery pulls so much data... the outer one filters that, and the outer-outer-one sorts it... just seems off... – user111222333 Aug 13 '15 at 00:05
  • [Get the SQL Query plan](http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan) and post the raw XML for it in your question. If you want help improving the performance of your query we need to know where your query is performing poorly and the query plan will help us see that. – Scott Chamberlain Aug 13 '15 at 00:27
  • @ScottChamberlain I'll work on that. However, it seems as if selecting all matching attributes with 2 joins, then ordering them based upon a distance calculation is inefficient. I'm looking at this from a theoretical perspective of selecting, calculating, then sorting. – user111222333 Aug 13 '15 at 00:29
  • 1
    The SQL engine is free to reorder the parts of your query any way it wants as long as it does not affect the result set. It may not be doing things in the order you wrote your code in. The query plan will show exactly what order it is doing stuff and what steps are slow. – Scott Chamberlain Aug 13 '15 at 00:30
  • My general rule is to write my queries in a way that is easy to read and maintain and trust that the optimizer will work out the best way to get me the results I want. Only when I'm actually faced with performance problems do I open my bag of tricks to speed things up. – Mike D. Aug 13 '15 at 01:32

1 Answers1

2

Because you are wanting to page the data which means you need to filter on the ROW_NUMBER() value which needs to be ordered by the Distance calculation you are going to need the 3 nested sub-queries. However, I do think it can be done a bit more cleanly.

I'm not sure why you're taking the TOP 100 PERCENT from one of you your inner queries since that isn't actually going to do anything. I can only assume at one point you were ordering that inner query and using the TOP clause to make it not error out. That doesn't work and the order of the results is not guaranteed. So it is good it looks like you've gone a different way since then.

The way you've used aliases in your query is also a bit confusing. In the innermost query you've aliased the actual User table as UsersSubquery and then you've aliased a later subquery with the same name. Finally you've aliased the outer-most sub-query as Users which is the same name as the base table.

There is no reason not to do your join to the Pictures table at the same time you're doing the other joins. There isn't any additional filtering between that join and the innermost query (which has the WHERE clause that filters the user attributes) so doing the join there would be the same thing.

This is how I would write the query:

DECLARE @RowsToFetch INT = 10;

SELECT TOP (@RowsToFetch) *
FROM (
    SELECT *,
        ROW_NUMBER() OVER (ORDER BY Distance ASC) AS RowNumber
    FROM (
        SELECT Users.*,
            (((geography::Point(UsersAndLocations.Latitude, UsersAndLocations.Longitude, 4326)).STDistance(@a)) / 1000) AS Distance
        FROM Users
        INNER JOIN UsersAndAttributes ON Users.UserId=UsersAndAttributes.UserId
        INNER JOIN UsersAndLocations ON Users.UserId=UsersAndLocations.UserId
        INNER JOIN Pictures ON Users.UserId=Pictures.UserId
        WHERE (@Attribute1 = NULL OR UsersAndAttributes.Attribute1Id = @Attribute1)
            AND (@Attribute2 = NULL OR UsersAndAttributes.Attribute1Id = @Attribute2)
            ..etc
    ) AS UsersData
) AS UsersNumbered
WHERE RowNumber >= @StartRow
ORDER BY RowNumber

Finally, I would actually be using SELECT * anywhere in the query (especially the innermost sub-query). Rather, select only the specific columns you need. This will help reduce the amount of work SQL server needs to do and reduce the amount of data that needs to move over the network when returning the results. I didn't know which columns you needed though so I've re-used your SELECT *.

Mike D.
  • 4,034
  • 2
  • 26
  • 41
  • Ok, I'd like to awknowledge all of your individual points. The TOP 100 PERCENT is specficially to avoid errors, correct. I was not ordering the inner query, however. I was trying to avoid the `INNER JOIN` on the `Pictures` table in the innermost query because it would be pulling all picture data for all users that match the Attribute criteria. Why not just pull that data for the outer queries somehow? Is there another approach altogether that I can take with this? This query averages at about 280-300ms and I really feel like I can get this to run much faster. Is that long for a query? – user111222333 Aug 13 '15 at 08:48
  • 1
    You're thinking about it too procedurally. The optimizer will likely not execute the query exactly as you've laid it out. Instead it will try to find the most efficient way to get the result set that you're after. To that end I don't think that having the join to the picture table inside or outside will make much of a difference. On the other hand bringing back the picture data in your result set will slow things down simply because it needs to transfer all of that data over the network. I'd be curious how fast the query is without returning the binary image data. – Mike D. Aug 14 '15 at 02:52
  • 1
    What error are you trying to stop by including the TOP 100 PERCENT? The only work around I know that uses that is trying to order a subquery or view (and it doesn't work). Also, limiting the number of columns returned by your query is a good place to start if you're trying to speed it up. The less data you have to push across the network from the SQL server to the client the better. IMHO 1/3 of a second for a query result would be acceptable to me. Does your functional requirements really need it to go faster than that? – Mike D. Aug 14 '15 at 02:55