Dealing with paging when [partially] duplicate records exist

Question

I have a code which calls a single page using SQL like this


select
  <some fields>,
  count(*) over.. as _$total_records
From
  <some inline view>
ORDER BY ...
OFFSET ...
FETCH ...

This SQL might return duplicated ids due to a join. And this was OK in the most of the cases. But a new requirement is to suppress duplicates. However, I must not suppress certain fields that are coming from the join which brings duplicates. In other words I need result set like this

From

id	name	reference
1	A	55
1	A	66
2	B	11
3	C	77
3	C	88

To

id	name	reference
1	A	55
2	B	11
3	C	77

I was thinking about this logic (pseudo code)


while (resultModelList.Count < pageSize && recordsReturned)
{

    reader = CallDb(sql);
    LoadPage(reader, resultModelList, out recordsReturned);

}

In this case ^^ LoadPage will check if the id is already added into the list and skip adding this record. And operation will repeat until page size is satisfied or no more records returned by the reader.

But I see potential issues with skipped records. And also correct page number due to count() over. Is there a good way to deal with it?

Update: this question is not about SQL solution but about paging solution. SQL here is built dynamically. It can have 1,2,3 joins based on configuration and criteria values. All these items are added dynamically. The SELECT is built dynamically.

You need to decide which row you want to return. Presumably you need some kind of `ROW_NUMBER` solution, see https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group. If you use a deterministic ordering you won't have skipped rows because that, but `OFFSET FETCH` can still have issues with skipped rows for a different reason: inserts and deletes in between each query. This is one of the reasons why [Keyset Pagination](https://stackoverflow.com/a/70520457/14868997) is much better — Charlieface, Mar 09 '22 at 21:32
@Charlieface in this case it is not important which row is returned. The `order by` will be done by one of the columns and always `id`. Which `reference` is returned - don't matter. Or, we can, in fact, set it to "multiple" during `reader` parsing. I will look into your solution — T.S., Mar 10 '22 at 03:54
Are you wanting to page in SQL Server or in C#? Pick one and then clarify your question. As it exists much of your question refers to eliminating duplicate records. If thats not your question, then remove it. — Dale K, Apr 01 '22 at 05:14
**This question is not about SQL-only solution** as so-called duplicate is about. This question is about paging. And solution is not necessary SQL-only — T.S., Oct 07 '22 at 22:38

score 1 · Accepted Answer · answered Mar 09 '22 at 22:15

1

It can be done witn using a nested select with Row_Number and pick up the appropriate rows with Partition By, something like this:

select
  <some fields>,
  count(*) over.. as _$total_records
From
  (
  select
  <some fields>,
  ROW_NUMBER() OVER(PARTITION BY id ORDER BY reference) rw 
From
  <some inline view>
  ) tbl
WHERE tbl.rw=1
ORDER BY ...
OFFSET ...
FETCH ...

answered Mar 09 '22 at 22:15

sa-es-ir

3,722
2
13
31

1

I got to give you for this. Although my problem is more complex, due to use of Oracle, MySql, not only Sql Server. And MySql does not have full support. However, sql server works. We employed in-code solution, which wasn't perfect, and now we're back to the drawing board – T.S. Oct 07 '22 at 23:16

Dealing with paging when [partially] duplicate records exist

1 Answers1