Distinctly sum a column on a joined table?

Question

This is a simple problem, and I'm not sure if its possible here. Here's the problem:

Explanation:

A ticket belongs to an attendee
An attendee has a revenue
I need to group the tickets by section and get the total revenue.
This double counts attendees because 2 tickets can belong to the same attendee, thus double counting it. I'd like to grab the sum of the revenue, but only count the attendees once.

In my sqlfiddle example, I'd like to see:

section | total_revenue
------------------------
A       | 40            <= 40 is correct, but I'm getting 50...
B       | null
C       | 40

I'd like to solve this without the use of sub queries. I need a scalable solution that will allow me to do this for multiple columns on different joins in a single query. So whatever allows me to accomplish this, I'm open to suggestions.

Thanks for your help.

I flagged this as duplicate, but found on a closer inspection that it is not. It is closely related, but different from [the previous question](http://stackoverflow.com/q/13169367/939860). Don't close it. — Erwin Brandstetter, Nov 01 '12 at 23:04
You appear to be operating under the misconception that subqueries are inherently slower or not scalable. That is not the case. Using a subquery in the SELECT list might require you to repeat it for each value and that's not ideal - but a subquery in `FROM` or in a CTE may well be the most efficient way to solve the problem both in code and performance terms. So I'd advise you to weaken that requirement to "without using a subquery in the SELECT list". — Craig Ringer, Nov 01 '12 at 23:50

score 1 · Answer 1 · answered Nov 01 '12 at 19:23

1

Here is a version using row_number():

select section,
  sum(revenue) Total
from 
(
  select t.section, a.revenue,
    row_number() over(partition by a.id, t.section order by a.id) rn
  from tickets t
  left join attendees a
    on t.attendee_id = a.id
) src
where rn = 1
group by section
order by section;

See SQL Fiddle with Demo

answered Nov 01 '12 at 19:23

Taryn

242,637
56
362
405

1

"without the use of sub queries" – RichardTheKiwi Nov 01 '12 at 19:49
@RichardTheKiwi I am aware of that requirement, I am showing a version with `row_number()` to show how it can be accomplished in the event they want to include additional fields. – Taryn Nov 01 '12 at 19:53
@RichardTheKiwi Based on the question, I suspect the intention was really *without subqueries in the SELECT list* because of the desire to avoid repeating the subquery for every value. – Craig Ringer Nov 01 '12 at 23:51

score 1 · Answer 2 · edited May 23 '17 at 12:13

Again, without subquery:

Key element is to add PARTITION BY to the window function(s):

SELECT DISTINCT
       t.section
--    ,sum(count(*))       OVER (PARTITION BY t.section) AS tickets_count
      ,sum(min(a.revenue)) OVER (PARTITION BY t.section) AS atendees_revenue
FROM   tickets t
LEFT   JOIN attendees a ON a.id = t.attendee_id
GROUP  BY t.attendee_id, t.section
ORDER  BY t.section;

-> sqlfiddle

Here, you GROUP BY t.attendee_id, t.section, before you run the result through the window function. And use PARTITION BY t.section in the window function as you want results partitioned by section this time.

Uncomment the second line if you want to get a count of tickets, too.

Otherwise, it works similar to my answer to your previous question. I.e., the rest of the explanation applies.

score 0 · Answer 3 · answered Nov 01 '12 at 19:20

0

You can do this:

select t.section, sum(d.revenue)
from 
(
  SELECT DISTINCT section, attendee_id FROM tickets
) t
left join attendees d on t.attendee_id = d.id
group by t.section
order by t.section;

answered Nov 01 '12 at 19:20

rs.

26,707
12
68
90

1

"without the use of sub queries". That, and the suspicion that the table structure is wrong anyway. I am guessing revenue is the total invoice value per attendee, i.e. for however many tickets the attendee bought ... – RichardTheKiwi Nov 01 '12 at 19:52

Distinctly sum a column on a joined table?

3 Answers3