0

I'm using RoR 4.2.4 trying to create an index page where every row shows some buyer info and three values summed from associated tables. I feel like this kind of query must happen all the time, and I'm just missing something really simple.

class Buyer < ActiveRecord::Base
  has_many :wins
  has_many :pledges
  has_many :payments

And wins, pledges and payments belong_to buyer.

This works:

  @buyers = Buyer.joins("LEFT OUTER JOIN pledges on buyers.id = pledges.buyer_id")
                 .where(event_id: @event.id)
                 .select("buyers.*, sum(pledges.amount) as pledges_total")
                 .group('buyers.id')
                 .order('buyers.last_name')

I get a list of Buyers and buyer.pledges_total gives me the sum of their pledges.

But in that same query, I also want to get sum(wins.amount) and sum(payments.amount).

This DOESN'T work:

  @buyers = Buyer.joins("LEFT OUTER JOIN wins on buyers.id = wins.buyer_id")
                 .joins("LEFT OUTER JOIN pledges on buyers.id = pledges.buyer_id")
                 .joins("LEFT OUTER JOIN payments on buyers.id = payments.buyer_id")
                 .where(event_id: @event.id)
                 .select("buyers.*, sum(wins.price) as wins_total, sum(pledges.amount) as pledges_total, sum(payments.amount) as payments_total")
                 .group('buyers.id')
                 .order('buyers.last_name')

I get weird incorrect values that make me think I'm summing after the join. But I'm really not sure, and I don't know how else to do it. I'm sure I'll be amazed at how simple this really is.

Thanks for the help.

HelloMundo
  • 15
  • 3

1 Answers1

1

The SUMs will not work correctly with the multiple JOINs you have. Rather than summing over the entire set of results try moving that operation into sub-queries.

@buyers = Buyer
  .where(event_id: @event.id)
  .select(<<-SELECT)
    buyers.*, 
    (SELECT SUM(wins.price) FROM wins WHERE wins.buyer_id = buyers.id) as wins_total, 
    (SELECT SUM(pledges.amount) FROM pledges WHERE pledges.buyer_id = buyers.id) as pledges_total, 
    (SELECT SUM(payments.amount) FROM payments WHERE payments.buyer_id = buyers.id) as payments_total
  SELECT
  .group('buyers.id')
  .order('buyers.last_name')

Note that the JOINs from your original query are no longer necessary.

Update

Here is an explanation of why your SUMs were incorrect when you had more than one JOIN in your query.

Suppose you have the following data:

# SELECT * FROM buyers;
 id  │ event_id
─────┼───────────
   1 │        1

# SELECT * FROM wins;
 id  │ buyer_id │ price
─────┼──────────┼───────
   2 │        1 │    10
   3 │        1 │    20

# SELECT * FROM pledges;
 id  │ buyer_id │ amount
─────┼──────────┼────────
   4 │        1 │     30
   5 │        1 │     40

SQL join returns the Cartesian product of the given record sets. That means that the results of a join may contain repeated values from individual tuples (rows). In the example below we can see that each tuple from wins and pledges is repeated twice. SQLFiddle

# SELECT buyers.id, wins.id AS wins_id, wins.price AS wins_price, pledges.id AS pledges_id, pledges.amount AS pledges_amount FROM buyers 
# INNER JOIN wins ON wins.buyer_id = buyers.id
# INNER JOIN pledges ON pledges.buyer_id = buyers.id;
 id  │ wins_id │ wins_price │ pledges_id │ pledges_amount
─────┼─────────┼────────────┼────────────┼────────────────
   1 │       2 │         10 │          4 │             30
   1 │       2 │         10 │          5 │             40
   1 │       3 │         20 │          4 │             30
   1 │       3 │         20 │          5 │             40

We can easily look back at the wins and pledges tables and see that the sum of win prices equals 30 and the sum of pledges equals 70. However, if we group by id (buyers.id) and performs the sums then we end up with incorrect values which are double the amount that they should be! SQLFiddle

# SELECT buyers.id, sum(wins.price) AS wins_total, sum(pledges.amount) AS pledges_total FROM buyers 
# INNER JOIN wins ON wins.buyer_id = buyers.id
# INNER JOIN pledges ON pledges.buyer_id = buyers.id
# GROUP BY buyers.id;
 id  │ wins_total │ pledges_total
─────┼────────────┼───────────────
   1 │         60 │           140

You can see that using sub-selects returns the correct results in this SQLFiddle.

Follow-up

Doesn't this sort of thing come up all the time where you need to sum up the values of an associated table?

Yes, this is a common problem to encounter.

And is this what people do?

I do it. :)

Or is there an entirely different approach that is smarter? The sub-query approach should work well for a fairly large amount of data as long as you have good indexes. You will probably encounter other areas of performance concern before sub-queries like this become a significant problem.

However, as an alternative to calculating the sum of the prices and amount with each query, you could cache the summed values for each buyer. A quick search turned up this SO question on that exact topic. Caching adds complexity and can be difficult at times. You will need to assess whether you truly need to cache values and whether or not it is worth the effort. The question I referred you to shows how to cache using ActiveRecord. It is also possible to set up triggers in the database which do the same thing (probably much more efficiently).

Community
  • 1
  • 1
Wizard of Ogz
  • 12,543
  • 2
  • 41
  • 43
  • Wow. Thanks @wizard for the super, super, informative answer! That wasn't as simple as I thought. Which leads me to believe that maybe I'm going about it all wrong? Doesn't this sort of thing come up all the time where you need to sum up the values of an associated table? And is this what people do? Or is there an entirely different approach that is smarter? – HelloMundo Feb 27 '17 at 07:59
  • @HelloMundo I've added to my response your questions in the answer under the "Follow-up" heading. – Wizard of Ogz Feb 27 '17 at 13:13