SQL how to limit results by field

Question

I have a query that I am trying to retrieve only 1 result (phone number field, can have multiple phone numbers stored) that has a value, however my results are duplicating the results for every phone number associated to a vendor. Here is an example of the result set:

As you can see from the above image, the results by NAME1 are being duplicated because the PHONE field has 3 distinct values. I only want to pull 1 phone number (doesnt matter which one, long as there are no nulls). Here is an example of my sql code, you can see the commented out sections where I tried to capture the MAX value in a derived query for C.PHONE.

SELECT DISTINCT A.NAME1, SUM( A.REMIT_AMT), A.REMIT_VENDOR, (C.PHONE), 
C.PHONE_TYPE
FROM PS_PAYMENT_TBL A, PS_VENDOR B, PS_VENDOR_ADDR_PHN C

WHERE A.PYMNT_DT >= '01-01-2017' 
 AND A.PYMNT_DT <= '12-31-2017' 
 AND A.REMIT_SETID = 'SHARE' 
 AND B.SETID = A.REMIT_SETID 
 AND B.VENDOR_ID = A.REMIT_VENDOR 
 AND B.VENDOR_CLASS <> 'E' 
 AND B.SETID = C.SETID 
 AND B.VENDOR_ID = C.VENDOR_ID
   --AND C.PHONE =
   --(SELECT MAX(C2.PHONE) FROM PS_VENDOR_ADDR_PHN C2)
 AND C.EFFDT = 
    (SELECT MAX(C_ED.EFFDT) FROM PS_VENDOR_ADDR_PHN C_ED 
    WHERE C.SETID = C_ED.SETID 
      AND C.VENDOR_ID = C_ED.VENDOR_ID 
      AND C.ADDRESS_SEQ_NUM = C_ED.ADDRESS_SEQ_NUM 
      AND C_ED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))

     GROUP BY  A.NAME1,  A.REMIT_VENDOR,  C.PHONE, C.PHONE_TYPE
     ORDER BY 2 DESC

I don't believe MS SQL Server support the LIMIT function as MySQL does, is there something similar I can use that works with MS SQL Server? Thanks!!

1/24 UPDATE:

SELECT DISTINCT A.NAME1, SUM( A.REMIT_AMT) As TOTAL_SPEND, A.REMIT_VENDOR, 
C.FIRST_PHONE, C.FIRST_PHONE_TYPE
FROM   
PS_PAYMENT_TBL A 

LEFT JOIN (

  SELECT DISTINCT VENDOR_ID,
    FIRST_VALUE(PHONE) OVER (
        PARTITION BY VENDOR_ID
        ORDER BY PHONE DESC
        ROWS UNBOUNDED PRECEDING
    ) AS FIRST_PHONE,
    FIRST_VALUE(PHONE_TYPE) OVER (
        PARTITION BY VENDOR_ID
        ORDER BY PHONE DESC
        ROWS UNBOUNDED PRECEDING
    ) AS FIRST_PHONE_TYPE
  FROM PS_VENDOR_ADDR_PHN C
  WHERE PHONE IS NOT NULL

) C ON A.REMIT_VENDOR = C.VENDOR_ID


, PS_VENDOR B , PS_VENDOR_ADDR_PHN CED

WHERE  A.PYMNT_DT >= '01-01-2017' 
 AND A.PYMNT_DT <= '12-31-2017' 
 AND A.REMIT_SETID = 'SHARE' 
 AND B.SETID = A.REMIT_SETID 
 AND B.VENDOR_ID = A.REMIT_VENDOR 
 AND B.VENDOR_CLASS <> 'E' 
 AND B.SETID = CED.SETID 
 AND B.VENDOR_ID = C.VENDOR_ID

 AND CED.EFFDT = 
    (SELECT MAX(CED.EFFDT) FROM PS_VENDOR_ADDR_PHN CED 
    WHERE CED.SETID = CED.SETID 
      AND CED.VENDOR_ID = CED.VENDOR_ID 
      AND CED.ADDRESS_SEQ_NUM = CED.ADDRESS_SEQ_NUM 
      AND CED.EFFDT <= SUBSTRING(CONVERT(CHAR,GETDATE(),121), 1, 10))

GROUP BY  A.NAME1,  A.REMIT_VENDOR,  C.FIRST_PHONE, C.FIRST_PHONE_TYPE
ORDER BY 2 DESC

You can use min or max, but you have to know which one you want. — Dan Bracuk, Jan 23 '18 at 17:24

Zev Spitz · Answer 1 · 2018-01-25T00:38:40.060

The OP's question is about LIMIT in SQL Server, when the actual problem is the duplicate records introduced by joining a single record in one table to other tables which have multiple records (a classic XY problem).

Let's try to pinpoint when the duplicate records are introduced. The following query has no introduced duplicate records:

-- Single table, no joins
SELECT A.NAME1, SUM( A.REMIT_AMT), A.REMIT_VENDOR
FROM PS_PAYMENT_TBL A
WHERE 
    A.PYMNT_DT >= '01-01-2017' 
    AND A.PYMNT_DT <= '12-31-2017' 
    AND A.REMIT_SETID = 'SHARE' 
GROUP BY A.NAME1, A.REMIT_VENDOR

Presumably, this will return the following:

NAME1                          (No column name) REMIT_VENDOR
------------------------------ ---------------- ------------
Owens & Minor Inc              11828775.730     ERSUT
Besse Medical Supply           11807640.040     BESMED
Medtronic                      11663552.040     MEDINB
Boston Scientific Corporation   8232800.420     SCIMED
Epic Systems Corporation        6841444.300     75128A

However, since each vendor may have multiple phone numbers in the PS_VENDOR_ADDR_PHN table, once we JOIN the two:

SELECT A.NAME1, SUM( A.REMIT_AMT), A.REMIT_VENDOR
FROM PS_PAYMENT_TBL A
INNER JOIN PS_VENDOR_ADDR_PHN C ON A.REMIT_VENDOR = C.VENDOR_ID
WHERE 
    A.PYMNT_DT >= '01-01-2017' 
    AND A.PYMNT_DT <= '12-31-2017' 
    AND A.REMIT_SETID = 'SHARE' 
GROUP BY A.NAME1, A.REMIT_VENDOR

we will get duplicate records. We may not notice it at first, so long as we are grouping on A.REMIT_VENDOR; but because of the duplicate records, the totals will also become messed up.

I suggest, instead of JOINing on the table which has the duplicate related records, JOIN on a subquery which only has a single record for each A.REMIT_VENDOR, and thus will not introduce duplicate records.

SELECT A.NAME1, SUM( A.REMIT_AMT), A.REMIT_VENDOR, C.FIRST_PHONE, C.FIRST_PHONE_TYPE
FROM PS_PAYMENT_TBL A
LEFT JOIN (

    -- This subquery returns the first PHONE and PHONE_TYPE, per VENDOR_ID
    -- if the records were ordered by the PHONE in DESC order
    -- FIRST_VALUE is a window function
    SELECT DISTINCT VENDOR_ID,
        FIRST_VALUE(PHONE) OVER (
            PARTITION BY VENDOR_ID
            ORDER BY PHONE DESC
            ROWS UNBOUNDED PRECEDING
        ) AS FIRST_PHONE,
        FIRST_VALUE(PHONE_TYPE) OVER (
            PARTITION BY VENDOR_ID
            ORDER BY PHONE DESC
            ROWS UNBOUNDED PRECEDING
        ) AS FIRST_PHONE_TYPE
    FROM PS_VENDOR_ADDR_PHN
    WHERE PHONE IS NOT NULL

) C ON A.REMIT_VENDOR = C.VENDOR_ID
GROUP BY A.NAME1, A.REMIT_VENDOR

References:

Using `JOIN` instead of `WHERE`

Your query doesn't use JOINs to relate multiple sets of data together, but rather WHERE conditions applied to the Cartesian product of all the data. My own feeling is that it is better to use JOIN to express the relationship between two sets of data and use WHERE exclusively for excluding records from the sets of data; if only because it makes it easier to debug these kinds of "duplicate records in one table cause duplicate records in entire result" problems, when you can see how each dataset is related to the others. (See here and here.)

I suggest using a JOIN to combine data from the PS_VENDOR table:

SELECT A.NAME1, SUM( A.REMIT_AMT) AS TOTAL_SPEND, A.REMIT_VENDOR, 
  C.FIRST_PHONE, C.FIRST_PHONE_TYPE
FROM PS_PAYMENT_TBL A 

-- See the INNER JOIN here; it's now easier to understand how PS_PAYMENT_TABLE
-- and PS_VENDOR are related
INNER JOIN PS_VENDOR B
  ON A.REMIT_SETID = B.SETID 
  AND A.REMIT_VENDOR = B.VENDOR_ID

LEFT JOIN (
  SELECT DISTINCT VENDOR_ID,
    FIRST_VALUE(PHONE) OVER (
        PARTITION BY VENDOR_ID
        ORDER BY PHONE DESC
        ROWS UNBOUNDED PRECEDING
    ) AS FIRST_PHONE,
    FIRST_VALUE(PHONE_TYPE) OVER (
        PARTITION BY VENDOR_ID
        ORDER BY PHONE DESC
        ROWS UNBOUNDED PRECEDING
    ) AS FIRST_PHONE_TYPE
  FROM PS_VENDOR_ADDR_PHN C
  WHERE PHONE IS NOT NULL
) C ON A.REMIT_VENDOR = C.VENDOR_ID,

WHERE
   A.PYMNT_DT >= '01-01-2017' 
  AND A.PYMNT_DT <= '12-31-2017' 
  AND A.REMIT_SETID = 'SHARE'
   -- with the JOIN, we can apply filtering conditions on data from the B table
  AND B.VENDOR_CLASS <> 'E'

GROUP BY A.NAME1, A.REMIT_VENDOR
ORDER BY 2 DESC

Combining the `EEFDT` field

(Still pending)

Original answer

The corresponding syntax for SQL Server is:

AND C.PHONE =
   (SELECT TOP 1 C2.PHONE FROM PS_VENDOR_ADDR_PHN C2)

This will return a single arbitrary PHONE. In order to return the max PHONE, sort the subquery records:

AND C.PHONE =
   (SELECT TOP 1 C2.PHONE FROM PS_VENDOR_ADDR_PHN C2 ORDER BY C2.PHONE DESC)

Reference -- TOP clause

I learned something new -- SQL Server supports a `WITH TIES` clause, such that if the records are ordered and multiple records "tie" for the top place based on the ordering, all such records will be returned. @Nick, thanks for this question. — Zev Spitz, Jan 23 '18 at 17:28
I do not get any results when I change the query to what your suggesting... — Nick, Jan 23 '18 at 17:56
@Nick What happens when you run the subquery on its own? Are there any results? Also, I may have misunderstood -- do you specifically want the `MAX(PHONE)`, or will any arbitrary `PHONE` do? — Zev Spitz, Jan 23 '18 at 18:04
I believe there is also a ROW_NUMBER function that I saw being used on another post (https://stackoverflow.com/questions/603724/how-to-implement-limit-with-microsoft-sql-server) not sure if this would work though... — Nick, Jan 23 '18 at 18:07
@Nick Is it possible that no records match the criteria of the outer query? — Zev Spitz, Jan 23 '18 at 18:08
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163739/discussion-between-zev-spitz-and-nick). — Zev Spitz, Jan 23 '18 at 18:12
Sorry chat is blocked for me (at work). When you comment out the subquery for C.PHONE there is definitely data being returned so I'm not sure what the issue is — Nick, Jan 23 '18 at 18:26
@Nick I've updated my answer with what I think is the real problem. — Zev Spitz, Jan 24 '18 at 12:51
I made the changes suggested above, and my query runs but the totals are not correct. I had to change some of the aliases for the C table (PS_VENDOR_ADDR_PHN) and I still need by B table (PS_VENDOR) included in the data so I added that back. In only some rows like Owns & Minor (Where there are multiple phone numbers I presume) I am getting the TOTAL_SPEND value multiplied 5 times than it should be (59143878.650) instead of the correct amount 11828775.730 ... Here is the query: — Nick, Jan 24 '18 at 12:54
@Nick Can you add the current version of your query to your question? — Zev Spitz, Jan 24 '18 at 12:57
@Nick I'm not clear exactly what you're trying to do in the `AND CED.EFFDT = ... ` part of the `WHERE`. Could you clarify? — Zev Spitz, Jan 24 '18 at 13:39
That is logic for Effective Dating and Effective Sequencing, which the database uses as part of the key values. Basically the sub-select is getting the max effective date on a record from the PS_VENDOR_ADDR_PHN table where the SETID, VENDOR_ID, ADDRESS_SEQ_NUM, and EFFDT are the same. Hope this helps... — Nick, Jan 24 '18 at 16:41
That is logic for Effective Dating and Effective Sequencing, which the database uses as part of the key values. Basically the sub-select is getting the max effective date on a record from the PS_VENDOR_ADDR_PHN table where the SETID, VENDOR_ID, ADDRESS_SEQ_NUM, and EFFDT are the same. Without that logic, I'm getting extra rows in the results which have the wrong Sum amounts. Hope this helps... — Nick, Jan 24 '18 at 16:46
@Nick Can you check if the query in the section **Using JOIN instead of WHERE** returns the right sums? — Zev Spitz, Jan 25 '18 at 00:34

SQL how to limit results by field

1 Answers1

Using JOIN instead of WHERE

Combining the EEFDT field

Original answer

Using `JOIN` instead of `WHERE`

Combining the `EEFDT` field