1

I need to retrieve ListingId from the below table based on the search condition. Kindly help the best way to retrive the query for the conditions below

Note : ListingId can have any number of ExtrafieldId, So Searching ListingId is based on the dynamic ExtrafieldId

If (ExtrafieldId = 1 and Value = 1) => OUTPUT - 20, 22
If (ExtrafieldId = 1 and Value = 1) and (ExtrafieldId = 2 and Value = 7) => OUTPUT - 21
If (ExtrafieldId =4and Value = 1999) => OUTPUT - 20, 21, 23

and so on...

ListingId   ExtraFieldId    Value      
20  1   1      
20  2   4      
20  3          
20  4   1990       
21  1   2      
21  2   7      
21  3          
21  4   1990       
22  1   1      
22  2   4      
22  3          
22  4   2000       
23  1   NULL       
23  2   NULL       
23  4   1999    
Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
m rajesh
  • 15
  • 6

3 Answers3

2

Use HAVING Instead of Self joins. It is much more efficient as requires no joins and only 1 table scan. It also means if there are multiple conditions it only requires an additional expression in the HAVING Clause rather than an additional join.

e.g. for your second example:

SELECT  ListingID
FROM    [YourTable]
GROUP BY ListingID
HAVING  COUNT(CASE WHEN ExtrafieldId = 1 AND Value = 1 THEN 1 END) > 0
AND     COUNT(CASE WHEN ExtrafieldId = 2 AND Value = 7 THEN 1 END) > 0

ADDENDUM

The above is just plain wrong. I think it is slightly easier on the eye, but the below is much more efficient.

SELECT  t1.ListingID
FROM    Listing AS t1 
        INNER JOIN Listing AS t2
            ON t2.ListingID = t1.ListingID 
        INNER JOIN Listing AS t3
            ON t3.ListingID = t1.ListingID 
        INNER JOIN Listing AS t4
            ON t4.ListingID = t1.ListingID 
WHERE   (t1.ExtraFieldID = 1 AND t1.Value = 1)
AND     (t2.ExtraFieldID = 2 AND t2.Value = 7)
AND     (t3.ExtraFieldID = 3 AND t3.Value = '')
AND     (t4.ExtraFieldID = 4 AND t4.Value = 1999)

To prove this I ran the following code to test it:

DECLARE @Iterations INT, @Listings INT
/*******************************************************************************************************
SET THE PARAMETERS FOR THE TEST HERE, @Listings IS THE NUMBER OF ListingIDs TO INSERT INTO THE SAMPLE
TABLE. EACH LISTING GETS 4 RECORDS SO 10,000 LISTINGS WILL GENERATE A SAMPLE OF 40,000 RECORDS ETC.
@Iterations IS THE NUMBER OF SELECTS TO PERFORM TO TEST THE PERFORMANCE OF EACH METHOD.
*******************************************************************************************************/
SET @Iterations = 500
SET @Listings = 1000000
/*******************************************************************************************************/
/*******************************************************************************************************/

IF EXISTS (SELECT * FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#Listing%')
    BEGIN
        DROP TABLE #Listing
    END

CREATE TABLE #Listing (ListingID INT NOT NULL, ExtraFieldID TINYINT NOT NULL, Value VARCHAR(4), PRIMARY KEY (ListingID, ExtraFieldID))

IF EXISTS (SELECT * FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#Results%')
    BEGIN
        DROP TABLE #Results
    END

CREATE TABLE #Results (GroupBy INT, SelfJoin INT)

DECLARE @i INT, @Time DATETIME, @Time2 DATETIME, @t INT
SET @i = ISNULL((SELECT MAX(ListingID) + 1 FROM #Listing), 0)
-- FILL LISTING TABLE WITH RANDOM VALUES
WHILE @i < @Listings
    BEGIN
        INSERT #Listing VALUES (@i, 1, ROUND(RAND() * 4, 0))
        INSERT #Listing VALUES (@i, 2, ROUND(RAND() * 20, 0))
        INSERT #Listing VALUES (@i, 3, CASE WHEN ROUND(RAND(), 0) = 0 THEN '' ELSE CONVERT(VARCHAR(4), ROUND(RAND(), 3) * 1000) END)
        INSERT #Listing VALUES (@i, 4, DATEPART(YEAR, DATEADD(YEAR, (RAND()-1) * 100, GETDATE())))

        SET @i = @i + 1
    END

CREATE NONCLUSTERED INDEX #IX_Listing_Value ON #Listing (Value) WITH FILLFACTOR = 100

SET @i = 0
-- PERFORM BOTH METHODS X NUMBER OF TIMES TO GET AN AVERAGE EXECUTION TIME
WHILE @i < @Iterations
    BEGIN
        SET @Time = GETDATE()

        SELECT  @t = COUNT(*)
        FROM    (   SELECT  ListingID
                    FROM    #Listing
                    GROUP BY ListingID
                    HAVING  COUNT(CASE WHEN ExtrafieldId = 1 AND Value = 1 THEN 1 END) > 0
                    AND     COUNT(CASE WHEN ExtrafieldId = 2 AND Value = 7 THEN 1 END) > 0
                    AND     COUNT(CASE WHEN ExtrafieldId = 3 AND Value = '' THEN 1 END) > 0
                    AND     COUNT(CASE WHEN ExtrafieldId = 4 AND Value = 1999 THEN 1 END) > 0
                ) D

        SET @Time2 = GETDATE()

        SELECT  @t = COUNT(*)
        FROM    (   SELECT  t1.ListingID
                    FROM    #Listing AS t1 
                            JOIN #Listing AS t2
                                ON t2.ListingID = t1.ListingID 
                            JOIN #Listing AS t3
                                ON t3.ListingID = t1.ListingID 
                            JOIN #Listing AS t4
                                ON t4.ListingID = t1.ListingID 
                    WHERE   (t1.ExtraFieldID = 1 AND t1.Value = 1)
                    AND     (t2.ExtraFieldID = 2 AND t2.Value = 7)
                    AND     (t3.ExtraFieldID = 3 AND t3.Value = '')
                    AND     (t4.ExtraFieldID = 4 AND t4.Value = 1999)
                ) D

        INSERT INTO #Results
        SELECT  DATEDIFF(MICROSECOND, @Time, @Time2) [GroupBy],
                DATEDIFF(MICROSECOND, @Time2, GETDATE()) [SelfJoin]

        SET @i = @i + 1
    END

IF NOT EXISTS (SELECT 1 FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#OverallResults%')
    BEGIN
        CREATE TABLE #OverallResults (GroupBy INT NOT NULL, SelfJoin INT NOT NULL, Iterations INT NOT NULL, Listings INT NOT NULL)
    END
INSERT INTO #OverallResults
SELECT  AVG(GroupBy) [Group By],
        AVG(SelfJoin) [Self Join],
        COUNT(*) [Iterations],
        @Listings
FROM    #Results

SELECT  AVG(GroupBy) [Group By],
        AVG(SelfJoin) [Self Join],
        COUNT(*) [Iterations],
        CONVERT(DECIMAL(5, 4), (AVG(GroupBy) - AVG(SelfJoin)) / 1000000.0) [Difference (Seconds)],
        CONVERT(DECIMAL(4, 2), 100 * (1 - (1.0 * AVG(SelfJoin) / AVG(GroupBy)))) [Percent Faster]
FROM    #Results

DROP TABLE #Listing
DROP TABLE #results

SELECT  Records,    
        Iterations,
        GroupBy [Group By],
        SelfJoin [Self Join],
        CONVERT(DECIMAL(5, 4), (GroupBy - SelfJoin) / 1000000.0) [Difference (Seconds)],
        CONVERT(DECIMAL(4, 2), 100 * (1 - (1.0 * SelfJoin / GroupBy))) [Percent Faster]
FROM    (   SELECT  Listings * 4 [Records], 
                    SUM(Iterations) [Iterations],
                    SUM(GroupBy * Iterations) / SUM(Iterations) [GroupBy],
                    SUM(SelfJoin * Iterations) / SUM(Iterations) [SelfJoin]
            FROM    #OverallResults
            GROUP BY Listings
        ) a

This can be run over and over with different variables. I ran this for 100, 1000, 10000, 100000 and 1000000 listings with 500 select statements on each to get an average execution time and this showed that self joining was on average about 60% faster up until 1,000,000 listings when it became 95% faster. The self join method is clearly the performance winner.

GarethD
  • 68,045
  • 10
  • 83
  • 123
  • Gareth, How to join the above ListingId to ListingId of Master Table [Listing] without using 'IN' clause OR some other best way? – m rajesh Feb 21 '12 at 12:29
  • You could use `SELECT * FROM ListingMasterTable a INNER JOIN ([MyAnswer]) b ON a.ListingID = b.ListingID`. The following would still work - `SELECT * FROM ListingMasterTable WHERE ListingID IN ([MyAnswer])`it just may not be the most efficient way. There are plenty of articles around discussing the merits of IN vs JOIN. http://stackoverflow.com/questions/2577174/join-vs-subquery – GarethD Feb 21 '12 at 12:47
  • This query will require a full table scan or some full index scans, and a `GROUP BY` counting. You may say that it's more efficient but in most ciscumstances, a query with many JOINs and without a GROUP BY, will be more efficient than this. Because it will require a few index searches (but not of the whole indexes, only the relevant parts, which may be tiny compared to the whole indexes.) – ypercubeᵀᴹ Feb 21 '12 at 12:59
  • See this similar question with a lot of ways to answer it - and benchmarks: [How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation) – ypercubeᵀᴹ Feb 21 '12 at 13:04
  • I stand massively corrected, have just done a load of testing on this and read through your link and it appears I am wrong. The self join is roughly 50% faster with no indexing and faster still when ListingID and ExtraFieldID are a primary key and Value has a nonclustered index. With a small amount of data ( < 1,000,000 rows) and a high number of clauses I would still be inclined to use the GROUP BY as the difference is a fraction of a second and I think the SQL is easier to read, when dealing with large amounts of data in the future I will certainly change my habits to use a join. Thanks! – GarethD Feb 21 '12 at 14:02
  • Thanks ypercube and GarethD. My table will grow larger in future. So I decided to use self join. Gareth, what tool are you using for load testing? Can you please post link for referring and tutorial for load testing. – m rajesh Feb 22 '12 at 06:55
  • @mrajesh It was not load testing that I did, I did a load of testing. I've added the script I used to test it to the answers, and the net result is that the self join method is much, much faster and gets faster still the more rows there are. – GarethD Feb 22 '12 at 12:27
1
SELECT
      t1.ListingID
FROM 
      TableX AS t1 

  JOIN                                --- 2nd JOIN
      TableX AS t2
    ON 
      t2.ListingID = t1.ListingID 

  JOIN                                --- 3rd JOIN
      TableX AS t3
    ON 
      t3.ListingID = t1.ListingID 

WHERE 
      (t1.ExtraFieldID, t1.Value) = (@ExtraFieldID_search1, @Value_search1)

                        --- 2nd condition
  AND 
      (t2.ExtraFieldID, t2.Value) = (@ExtraFieldID_search2, @Value_search2)

                        --- 3rd condition
  AND 
      (t3.ExtraFieldID, t3.Value) = (@ExtraFieldID_search3, @Value_search3)

If you need 3 conditions, you'll need to join the table to itself one more time (so 3 times in total)

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Hi Thank you for the answer, Can you please give the same query for 3 or 4 tables. I am a beginner and couldnt write the query for joining the third table. Maximum I have 4 or 5 table to join like this. – m rajesh Feb 21 '12 at 11:39
  • +1 As commented in my answer this is a more efficient solution than the one I posted. – GarethD Feb 21 '12 at 14:02
0

You can use union and distinct quite easy for this. If you're using the ListingId as an input to another query using an IN-clause you don't have to mind the duplicates otherwise you can add

SELECT DISTINCT ListingId FROM (
  SELECT
    ListingId
  ... -- the rest from below
) AS Data

Here's the query to get the listing (with possible duplicates!):

SELECT
  ListingID
FROM
  TABLE_NAME
WHERE
  ExtrafieldId = 1 and Value = 1
UNION ALL
SELECT
  ListingID
FROM
  TABLE_NAME
WHERE
  ExtrafieldId = 1 AND Value = 1 AND ExtrafieldId = 2 and Value = 7
UNION ALL
SELECT
  ListingID
FROM
  TABLE_NAME
WHERE
  ExtrafieldId = 4 AND Value = 1999
Asken
  • 7,679
  • 10
  • 45
  • 77
  • 1
    The middle SELECT is pointless. `WHERE ExtrafieldId = 1 AND Value = 1 AND ExtrafieldId = 2 and Value = 7` can never return any results - because if Extrafield = 1 then it can't equal 2 so the condition is never satisfied. You are also using `UNION` to separate independant `WHERE` Clauses. `OR` and Parentheses work equally well for this with better performance. – GarethD Feb 21 '12 at 12:03