1

I am working on a view that consists of the following highly-repetetive statements:

SELECT 10 as [TopN],
    (Select SUM(A) FROM (Select TOP 10 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 10 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 10 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 10 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 10 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
SELECT 100 as [TopN],
    (Select SUM(A) FROM (Select TOP 100 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 100 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 100 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 100 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 100 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
SELECT 1000 as [TopN],
    (Select SUM(A) FROM (Select TOP 1000 A FROM [dbo].TheTable ORDER BY A DESC) T) as A,
    (Select SUM(B) FROM (Select TOP 1000 B FROM [dbo].TheTable ORDER BY B DESC) T) as B,
    (Select SUM(C) FROM (Select TOP 1000 C FROM [dbo].TheTable ORDER BY C DESC) T) as C,
    (Select SUM(D) FROM (Select TOP 1000 D FROM [dbo].TheTable ORDER BY D DESC) T) as D,
    (Select SUM(E) FROM (Select TOP 1000 E FROM [dbo].TheTable ORDER BY E DESC) T) as E
UNION ALL
--etc...
--  The same 7 lines of code repeated dozens of times for different values of `TopN`

In order to produce a table of top-value summations for each column.

This is what the table looks like:

| TOPN |   A   |   B   |   C   |   D   |   E   |
|   10 |   234 |  ...
|  100 |   734 |  ...
| 1000 |  1298 |  ...
|  ... |   ... |  ...

WHY do I need this query?

In the real world, a summary report such as this answers questions like:

  • "What is the total income of the top 10 income earners", "What is the total income of the top 100 income earners", etc... in column "A"
  • "What is the total debt of the top 10 debt holders", etc.. in column "B",

And so on. Each column is a report based on the "standalone" ordering of that column. As such, the above table is an end-user deliverable.


WHAT am I looking for?

A version of the above query that is any of the following: simpler, deduplicated, more efficient, more maintainable.

The above query works fine, and produces the desired table that I have mocked out above. BUT it is clearly inefficient. For example, someone doing this by hand would be able to:

  • sort by each column once
  • Start summing until they reach the 10th sorted item, spit out the total
  • Continue summing until they hit the 100th sorted item, and spit out that total
  • etc... (e.g. without re-ordering or starting over at element 1 again).

The above query is also repetitious - for example, if this were a stored procedure, one could loop over a list of values (10, 100, 1000, etc..) and produce this table one row at a time using a single chunk of parameterized code (as in @Larnu's answer below). This approach is not supported by views though. Since the current implementation is as a View, it would be considered a regression if it were converted to a stored proc or function that must be executed differently (because all existing usages would have to be modified).

Therefore, my ask is simply whether there is any way to make this better.


My Ideas

Ideally, I could inline a list of values, for example:

Select * From (VALUES((10), (100), (1000), (5000), ...)) AS TOPN_VALUES(TOPN)

Or I'm happy to have those values captured in a simple 1-column table somewhere.

Either way, what remains is the need for (likely) some clever join or cross apply logic to generate all of the above table entries from that list of numbers, as opposed to having the numbers all hard-coded amongst dozens of chinks copy-pasted Select statements as they are in the original query.

One thing that's clear to me is that SELECT TOP X cannot be parameterized in a view, so at the very least, we will have to re-implement that logic in a different way. One potential solution is to rewrite:

Select SUM(A) FROM (Select TOP 10 A FROM [dbo].SomeTable ORDER BY A DESC)

As:

Select SUM(A) FROM (Select A, ROW_NUMBER() over (order by A desc) as [Rank])
WHERE [RANK] < 10

At which point the "10" above can be dynamically determined by a value joined in from another table. Still lots of work to be done here to get to a full solution though...


Thanks for your help

Alain
  • 26,663
  • 20
  • 114
  • 184
  • A `VIEW` cannot be parametrised. If you need to parametrise something you select from, use an inline table-value function. – Thom A Oct 16 '20 at 12:22
  • @Larnu, not looking to parameterize anything, just pre-define a list of 10 or so TOPN values and use them without hard coding the same query 10 times. – Alain Oct 16 '20 at 12:28
  • I don't follow what you're asking here then, @Alain . What are you actually trying to achieve here? – Thom A Oct 16 '20 at 12:29
  • @Larnu Just asking if there's a way to collapse the above query duplicated 3 times for 3 different TOPN values into something roughly 1/3rd the size. – Alain Oct 16 '20 at 12:30
  • So, like I said, you want to **parametrise** the `TOP`, no? If not, then you need to elaborate here. – Thom A Oct 16 '20 at 12:31
  • Right, parameterize the inner calls to `TOP`, but not the `VIEW` itself. – Alain Oct 16 '20 at 12:32
  • Why do you want the `SUM` of the top 10, 100 and 1000, from 5 different tables, in a single query anyway? This smells like an [XY Problem](http://xyproblem.info) if I am honest. – Thom A Oct 16 '20 at 12:32
  • To answer the question e.g. "What is the total income of the top 1% of citizens, What about the top 2%? 5%? 10%?." A pretty common line of questioning when summarizing data. Here "A" is "income". "B"..."E" could be other things, like expenses, mortgage, debt, etc... "What is the total debt of the top 1% largest debt holders?" – Alain Oct 16 '20 at 12:34
  • To be clear, all this data is coming from a single table. We're just sorting each column independently to generate this summary report. – Alain Oct 16 '20 at 12:41

1 Answers1

1

If I am reading between the lines correctly, then use an inline table-value function:

CREATE FUNCTION dbo.YourFunction (@Top int) 
RETURNS table
AS RETURN
    SELECT @Top AS [TopN],
           (SELECT SUM(A) FROM (Select TOP (@Top) A FROM [dbo].SomeTable ORDER BY A DESC) T) as A,
           (SELECT SUM(B) FROM (Select TOP (@Top) B FROM [dbo].SomeTable ORDER BY B DESC) T) as B,
           (SELECT SUM(C) FROM (Select TOP (@Top) C FROM [dbo].SomeTable ORDER BY C DESC) T) as C,
           (SELECT SUM(D) FROM (Select TOP (@Top) D FROM [dbo].SomeTable ORDER BY D DESC) T) as D,
           (SELECT SUM(E) FROM (Select TOP (@Top) E FROM [dbo].SomeTable ORDER BY E DESC) T) as E;
GO

Then you just call the function like below:

SELECT *
FROM dbo.YourFunction(10);
Thom A
  • 88,727
  • 11
  • 45
  • 75
  • Hmm, appreciate the thought, but I want to return a single table with multiple rows for each value of "TopN" in a list, not a table with a single row. Also pretty committed to doing this as a `VIEW` since the current solution is a `VIEW`. Just aiming to reduce code duplication. – Alain Oct 16 '20 at 12:39
  • Then, again, I don't understand your goal, and you need to elaborate here, @Alain as your question is unclear, at least to me. You can't make multiple calls to a query with different `TOP` values, so you really need to provide some kind of sample data and expected results. – Thom A Oct 16 '20 at 12:41
  • alright, I've updated the original question with detailed answers to your questions, and a clear statement of ask. – Alain Oct 16 '20 at 15:34