1

In my app, Invoice has_many item_numbers and Invoice has_many payments. Each invoice has a balance, which is the sum of the ItemNumber amount attributes, less the sum of the Payment amount attributes.

The balance is very easy to calculate in the invoice model, but I am trying to write a query that sorts invoices by balance and this is proving much harder to do in ActiveRecord/SQL.

I have successfully managed to order the invoices on the total of the item_numbers with the following query (thanks Daniel Rikowski):

Invoice.where(user_id: 1, deleted: false, status: 'Sent')
       .joins(:item_numbers)
       .select('invoices.*, sum(item_numbers.amount)')
       .group('invoices.id')
       .order('sum(item_numbers.amount) asc')
       .limit(20)

I have tried to extend this to order by the balance with the following;

Invoice.where(user_id: 1, deleted: false, status: 'Sent')
       .joins(:item_numbers)
       .joins("FULL OUTER JOIN payments ON payments.invoice_id = invoices.id")
       .select("invoices.*, sum(item_numbers.amount_with_gst) - COALESCE(sum(payments.amount), 0)")
       .group("invoices.id")
       .order("sum(item_numbers.amount_with_gst) - COALESCE(sum(payments.amount), 0) #{dir}")/

There are two problems with this query. First, it's horrendously ugly, and second, it doesn't work. I used the full outer join on the payments table as not all invoices have a payment and if I used just joins(:payments) any invoice without a payment was excluded from the results. The COALESCE was put there to deal with null amounts.

The query comes close, but say there are 3 item_numbers and 1 payment (a pretty typical scenario), the payment amount will be subtracted 3 times resulting in a balance much less than the actual amount (and usually a negative balance).

It's probably pretty clear how out of my depth I am. I've put a lot of effort into this query (about 4 hours of reading and failed attempts) and can't quite nail it. My database is PostgreSQL.

brad
  • 9,573
  • 12
  • 62
  • 89

2 Answers2

3

Not sure about the AR syntax, but a proper query would be:

SELECT i.*, COALESCE(n.total, 0) - COALESCE(p.total, 0) AS balance
FROM   invoices i
LEFT   JOIN (
    SELECT invoice_id, sum(amount) AS total
    FROM   payments
    GROUP  BY invoice_id 
    ) p ON p.invoice_id = i.id
LEFT   JOIN (
    SELECT invoice_id, sum(amount_with_gst) AS total
    FROM   item_numbers
    GROUP  BY invoice_id 
    ) n ON n.invoice_id = i.id
WHERE  i.user_id = 1
AND    i.deleted = false
AND    i.status = 'Sent'
ORDER  BY balance;

If you join two has_many tables to the base tables, the rows multiply each other leading to completely arbitrary results. You can solve that by aggregating the totals before you join to the base table.

Also, I did not see a join condition for item_numbers in your query. That would lead to a cross join - extremely expensive besides being extremely wrong. (Or is AR smart enough to derive a join condition from a foreign key relationship automatically? If so, why the join condition on the second table?) Assuming that item_numbers has an invoice_id column just like payments, I amended that.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thankyou very much. Your answer does indeed work if I put it into an Invoice.find_by_sql(). There was no join condition on item_numbers as AR does an inner join in this situation (Invoice.joins(:item_numbers) => SELECT "invoices".* FROM "invoices" INNER JOIN "item_numbers" ON "item_numbers"."invoice_id" = "invoices"."id"). I have selected the other answer as it included the ActiveRecord solution and it is what I have used in my code. Your answer is very educational however. – brad Jan 01 '13 at 22:12
2

Your problem is caused by columns multiplying. Imagine having one Payment and three Item_numbers belonging to a Invoice. The result of a regular join would be something like this:

| invoice.id | item_number.amount | payment.amount |
| 1          | 4                  | 5              |
| 1          | 7                  | 5              |
| 1          | 2                  | 5              |

Because of this, sum(payment.amount) will return 15 and not 5. To get the correct sum, you have to fetch the sum directly:

Invoice.select('invoices.id, (SELECT SUM(item_numbers.amount) from item_numbers WHERE item_numbers.invoice_id = invoices.id) - (SELECT COALESCE(SUM(payments.amount),0) from payments WHERE payments.invoice_id = invoices.id) AS balance').group('invoices.id')
  • This works perfectly and has the added advantage of being pleasingly comprehensible to me. I have even modified some of my existing search queries to sort in the same manner. The .group('invoices.id') does not appear to be necessary any more though (presumably because there is no join). Thankyou. – brad Jan 01 '13 at 22:08
  • @brad: Be aware that - for more than a few rows in the result - a correlated subquery is generally much slower than a (`LEFT`) `JOIN` like I demonstrate. Some even call that an anti-pattern. Test the difference with [`EXPLAIN ANALYZE`](http://www.postgresql.org/docs/current/interactive/sql-explain.html). – Erwin Brandstetter Jan 02 '13 at 23:03
  • Thanks for the heads up. I'll do some more investigating. My database is pretty small and my queries are pretty small too, so it shouldn't affect me too much (for now....). – brad Jan 03 '13 at 01:41