4

Following @mdb's answer to apply pagination using SQL SERVER, I find it hard to retrieve distinct records when the main table is joined to other tables for a one-to-many relationship, i.e, A person has many addresses.

Use case, suppose I want to retrieve all persons which has an address in New York given tables #temp_person and #temp_addresses, I would join them on PersonID and OwnerID.

The problem arises when there are multiple addresses for a person, the result set contains duplicate records.

To make it clearer, here's a sample query with data:

Sample Data:

create table #temp_person (
    PersonID int not null,
    FullName varchar(max) not null
)

create table #temp_addresses(
    AddressID int identity not null,
    OwnerID int not null,
    Address1 varchar(max),
    City varchar(max)
)

insert into #temp_person
values 
    (1, 'Sample One'),
    (2, 'Sample Two'),
    (3, 'Sample Three')

insert into #temp_addresses (OwnerID,  Address1, City)
values
(1, 'Somewhere East Of', 'New York'),
(1, 'Somewhere West Of', 'New York'),
(2, 'blah blah blah', 'Atlantis'),
(2, 'Address2 Of Sample Two', 'New York'),
(2, 'Address3 Of Sample Two', 'Nowhere City'),
(3, 'Address1 Of Sample Three', 'New York'),
(3, 'Address2 Of Sample Three', 'Seattle')

--drop table #temp_addresses, #temp_person

Pagination Query:

 SELECT
    (
        CAST( RowNum as varchar(MAX) ) 
        + '/' 
        + CAST(TotalCount as varchar(MAX))
     ) as ResultPosition
    , PersonID
    , FullName
FROM (
SELECT DISTINCT
        ROW_NUMBER() OVER(ORDER BY p.FullName ASC) as RowNum 
        , p.PersonID
        , p.FullName
        , Count(1) OVER() as TotalCount
    FROM #temp_person p
    LEFT JOIN #temp_addresses a
        ON p.PersonID = a.OwnerID
    WHERE City = 'New York'
) as RowConstrainedResult
WHERE RowNum > 0 AND RowNum <= 3
ORDER BY RowNum

Expected Results:

ResultPosition  PersonID    FullName
   1/3           1         Sample One
   2/3           2         Sample Two
   3/3           3        Sample Three

Actual Results:

ResultPosition  PersonID    FullName
   1/4            1        Sample One
   2/4            1        Sample One
   3/4            3       Sample Three

As you can see, the inner query is returning multiple records due to the join with #temp_addresses.

Is there a way we could only return unique records by PersonID?

UPDATE:

Actual use case is for an "Advanced Search" functionality where the user can search using different filters, i.e, name, firstname, last name, birthdate, address, etc.. The <WHERE_CLAUSE> and <JOIN_STATEMENTS> in the query are added dynamically so GROUP BY is not applicable here.

Also, please address the "Pagination" scheme for this question. That is, I want to retrieve only N number of results from Start while also retrieving the total count of the results as if they are not paged. i.e, I retrieve only 25 rows out of a total of 500 results.

xGeo
  • 2,149
  • 2
  • 18
  • 39

3 Answers3

0

Just do group by PersonID and no need to use subquery

SELECT 
         cast(row_number() over (order by (select 1)) as varchar(max)) +'/'+
         cast(Count(1) OVER() as varchar(max)) ResultPosition,  
         p.PersonID,
         max(p.FullName) FullName
FROM #temp_person p
LEFT JOIN #temp_addresses a ON p.PersonID = a.OwnerID
WHERE City = 'New York'
group by p.PersonID

EDIT : I would use CTE for the pagination

;with cte as
(
    SELECT 
         row_number() over(order by (select 1)) rn,
         cast(row_number() over (order by (select 1)) as varchar(max)) +'/'+
         cast(Count(1) OVER() as varchar(max)) ResultPosition,  
         p.PersonID,
         max(p.FullName) FullName
    FROM #temp_person p
    LEFT JOIN #temp_addresses a ON p.PersonID = a.OwnerID
    WHERE City = 'New York'
    group by p.PersonID
) 
select * from cte 
where rn > 0  and rn <= 2

Result:

ResultPosition  PersonID    FullName
1/3             1           Sample One
2/3             2           Sample Two
3/3             3           Sample Three
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • I want to retrieve paged results, not the whole query. Also, while your answer is applicable for the above sample test-case, I'm afraid I cannot use `GROUP BY` in my actual project. – xGeo Jan 31 '18 at 13:44
0

You need to have distinct rows before using ROW_NUMBER().

If you will filter by City, there are no need to use LEFT JOIN. Use INNER JOIN instead.

select ResultPosition = cast(row_number() over (order by (r.PersonID)) as varchar(max)) +'/'+ cast(Count(r.PersonID) OVER() as varchar(max)), *
from(
    SELECT distinct p.PersonID,
                    p.FullName
    FROM #temp_person p
        JOIN #temp_addresses a ON 
            p.PersonID = a.OwnerID
    WHERE City = 'New York') r

EDIT: Considering pagination

declare @page int =1, @rowsPage int = 25

select distinct position, ResultPosition = cast(position as varchar(10)) + '/' + cast(count(*) OVER() as varchar(10)), *
from(
    SELECT  position = DENSE_RANK () over (order by p.PersonID),
            p.PersonID,
            p.FullName
    FROM #temp_person p
        LEFT JOIN #temp_addresses a ON 
            p.PersonID = a.OwnerID
    WHERE City = 'New York'
    ) r
where position between @rowsPage*(@page-1)+1 and @rowsPage*@page
Marta B
  • 438
  • 2
  • 9
  • doesn't even address the pagination funtionality i mentioned. – xGeo Jan 31 '18 at 13:44
  • `DISTINCT` would'nt be reliable in my use-case as the JOINS and WHERE statements are dynamically added. Please see my question again. I edited it. – xGeo Jan 31 '18 at 14:55
  • edited again. I saw your edit. So, if you can not use DISTINCT, remove it. DENSE_RANK() does not need DISTINCT. Move DISTINCT to the exterior SELECT. – Marta B Jan 31 '18 at 15:19
0

Geoman Yabes, Check if this help... Gives results expected in your example and you can have pagination using RowNum:-

SELECT *
FROM
(SELECT ROW_NUMBER() OVER(ORDER BY RowConstrainedResult.PersonId ASC) As RowNum,
        Count(1) OVER() As TotalRows,
        RowConstrainedResult.PersonId,
        RowConstrainedResult.FullName
 FROM (
    SELECT 
          RANK() OVER(PARTITION BY p.PersonId ORDER BY a.Address1 ASC) as Ranking 
        , p.PersonID
        , p.FullName
    FROM #temp_person p
    INNER JOIN #temp_addresses a ON p.PersonID = a.OwnerID WHERE City = 'New York'
) as RowConstrainedResult WHERE Ranking = 1) Filtered
Where RowNum > 0 And RowNum <= 4

Sample Data:

insert into #temp_person
values 
    (1, 'Sample One'),  
    (2, 'Sample Two'),  
    (3, 'Sample Three'),  
    (4, 'Sample 4'),  
    (5, 'Sample 5'),  
    (6, 'Sample 6')  

insert into #temp_addresses (OwnerID,  Address1, City)
values
(1, 'Somewhere East Of', 'New York'),  
(1, 'Somewhere West Of', 'New York'),  
(2, 'blah blah blah', 'Atlantis'),  
(2, 'Address2 Of Sample Two', 'New York'),  
(2, 'Address3 Of Sample Two', 'Nowhere City'),  
(3, 'Address1 Of Sample Three', 'New York'),  
(3, 'Address2 Of Sample Three', 'Seattle'),  
(4, 'Address1 Of Sample 4', 'New York'),  
(4, 'Address1 Of Sample 4', 'New York 2'),  
(5, 'Address1 Of Sample 5', 'New York'),  
(6, 'Address1 Of Sample 6', 'New York')  

enter image description here

xGeo
  • 2,149
  • 2
  • 18
  • 39
Raska
  • 209
  • 2
  • 7