0

Why

Does this give incorrect results?

SELECT
    people.name,
    SUM(allorders.TOTAL),
    SUM(allorders.DISCOUNT),
    SUM(allorders.SERVICECHARGE),
    SUM(payments.AMOUNT)
FROM
    people
INNER JOIN
    allorders ON allorders.CUSTOMER = people.ID
INNER JOIN
    payments ON payments.CUSTOMER = people.ID
WHERE
    people.ID = 7 AND allorders.VOIDED = 0 AND payments.VOIDED = 0

Gives: (the name), 1644000, 1100000, 50000, 1485000

If I do it two tables at a time (INNER JOIN people ON allorders.CUSTOMER = people.ID) in separate queries, I get the correct results. I don't don't even know where the numbers I get come from. Like:

SELECT
    people.name,
    SUM(allorders.TOTAL),
    SUM(allorders.DISCOUNT),
    SUM(allorders.SERVICECHARGE)
FROM
    people
INNER JOIN
    allorders ON allorders.CUSTOMER = people.ID
WHERE people.ID = 7 AND allorders.VOIDED = 0

Gives: (the name), 822000, 550000, 25000

SELECT
    people.name,
    SUM(payments.AMOUNT)
FROM
    people
INNER JOIN payments ON payments.CUSTOMER = people.ID
WHERE people.ID = 7 AND payments.VOIDED = 0

Gives: (the name), 297000

It looks like it doubles, but I don't know why.

The odd thing is I have a similar query that does this sum correctly. I'll post it, but it's a bit complex. Here goes:

SELECT
    t1.IDENTIFIER,
    ifnull(t1.NAME,""),
    t1.PRICE,
    t1.GUESTS,
    t1.STATUS,
    ifnull(t1.NOTE,""),
    t1.LINK,
    ifnull(t1.EDITOR,""),
    concat(t2.FIRSTNAME,"",t2.LASTNAME),
    t2.ID,
    t3.ID,
    ifnull(t1.EMAIL,""),
    ifnull(t3.PHONE,""),
    ifnull(SUM(p1.AMOUNT),0),
    ifnull(SUM(o1.DISCOUNT),0),
    ifnull(SUM(o1.TOTAL),0),
    ifnull(SUM(o1.SERVICECHARGE),0)
FROM
    tables t1
INNER JOIN
    people t2 ON t1.SELLER = t2.ID
INNER JOIN
    people t3 ON t1.CUSTOMER = t3.ID
INNER JOIN
    orderpaymentinfo ON orderpaymentinfo.TABLEID = t1.IDENTIFIER
INNER JOIN
    payments p1 ON orderpaymentinfo.PAYMENTID = p1.PAYMENTID
INNER JOIN
     allorders o1 ON o1.ORDERID = orderpaymentinfo.ORDERID
WHERE
    p1.VOIDED = 0 AND o1.VOIDED = 0 AND t1.DATE = "2014-12-20"
GROUP BY t1.IDENTIFIER

The latter statement does the same join, only it uses an additional helper-table. I'm sorry it's a bit poorly formatted (I'm not great with SO's formatter), but if someone can tell me the difference between the logic in these two statements and how one can be completely wrong while the other right, I'd be very happy.

In response to answer:

Result 1:

Name - 5

Result 2:

Name - 2

Result 3:

Name - 10

Result 4 is truncated in phpMyAdmin - where would I get this easily?

Table structure for the three tables looks like:

SHOW create on the way.

nickdnk
  • 4,010
  • 4
  • 24
  • 43
  • Okay so it's pretty clear that you've got some unexpected many-to-many relationship stuff happening here, and you need to track down what is unclear. I'm gonna post a longer response in a moment as an answer, even though the answer itself will depend on some of the queries you need to try... one moment – Evan Volgas Dec 23 '14 at 19:39
  • Thanks to whoever formatted my question. Is it because I use two tables (t2 and t3) in the last query, that it works? – nickdnk Dec 23 '14 at 19:40
  • 2
    nickdnk the problem likey has to do with the one to many relationship between people orders and people payments, you may need to identify what payment is meant for what order on the join to reduce the artificial bloat on the sum. Or, you may need to sum each separately and then add the results. What's happening is a Cartesian is occurring that you don't expect. Eliminate the aggregate and look at a specific user, you'll find the problem if you truly understand the data and the joins. – xQbert Dec 23 '14 at 19:42
  • I realize there is a problem. But what I don't understand is how the problem is not present in the last query. This one sums everything correctly, which as me confused. – nickdnk Dec 23 '14 at 19:43
  • Probably because the lat one has Group By! – NoChance Dec 23 '14 at 19:44
  • @EmmadKareem - what would I group by when people.ID is defined? If I didn't define people.ID, I believe you're right. – nickdnk Dec 23 '14 at 19:44
  • Doesn't change anything. If I use group by I just get all the people in the table (not just #7) - all incorrectly summed. – nickdnk Dec 23 '14 at 19:45
  • 1
    @nickdnk the last one likely works because it does the missing join `INNER JOIN allorders o1 ON o1.ORDERID = orderpaymentinfo.ORDERID` which eliminates the aforementioned Cartesian or because of a where clause which limits the data to a specific order where the Cartesian doesn't exist. To learn this you really need to look at the data as a whole. and spend time understanding the data for a particular order and payment. I suspect there are a few places where one order had multiple payments which is throwing stuff off or that multiple orders were paid off by one payment.. – xQbert Dec 23 '14 at 19:46
  • @PatrickQ, you are correct. – NoChance Dec 23 '14 at 19:49
  • 2
    An [SQLFiddle](http://sqlfiddle.com/) with sample data would _really_ help here. – Patrick Q Dec 23 '14 at 19:50
  • Okay to be clear... the first query showing count star of five means that there are five records for which people.ID = 7 AND allorders.VOIDED = 0. The second query showing count star of two means there are two records for which people.ID = 7 and payments.VOIDED = 0. So when you join those two together, you'll be doing an aggregation over 10 records, instead of presumably the five you intended to sum over in the allorders table. That's why your results are doubling exactly. Your join is Cartesian. You need to dig into the schema. – Evan Volgas Dec 23 '14 at 19:58
  • 1
    You said, "if someone can tell me the difference between the logic in these two statements" `INNER JOIN allorders o1 ON o1.ORDERID = orderpaymentinfo.ORDERID` is a pretty big logic difference as is. `AND t1.DATE = "2014-12-20"` – xQbert Dec 23 '14 at 20:01
  • I realize the date is not present in the first. I could remove that and get proper results in the last statement if I did GROUP BY date - but I can't do GROUP BY people.ID - gives wrong results, so since first has defined people.ID I defined t.1 date – nickdnk Dec 23 '14 at 20:03
  • I would be very hesitant to go so far as to assert that the last statement you provided is correct. It may be correct. Or it may not be. If you know for certain that it is, then you should be able to use it to unwind why your join is Cartesian. If you aren't certain that the last one is correct, I wouldn't assume that it is... after all, you have a Cartesian join in the first and you don't know why it's happening. How are you so certain you don't have one in the last statement either? Maybe you are certain the last result is accurate, but you haven't provided any reason why you should be. – Evan Volgas Dec 23 '14 at 20:07
  • I am certain because it gives correct numbers in every situation. The entire system would deliver wrong numbers if the last statement didn't work - and it doesn't. Also, I can easily check by doing individual sums in separate queries, which I have done. – nickdnk Dec 23 '14 at 20:08
  • I'll read on Cartesian joins. You learn something every day, as they say. – nickdnk Dec 23 '14 at 20:13

1 Answers1

1

Okay, so I am pretty sure you've a join condition that's basically exploding your result set into something like a Cartesian product. Here's what I think you should try

First, run the following and share the output:

SELECT p.name,COUNT(*) 
FROM people  as p
INNER JOIN allorders  AS a
ON a.CUSTOMER = p.ID 
WHERE p.ID = 7 AND a.VOIDED = 0
GROUP BY p.name

Then run

SELECT p.name,COUNT(*) 
FROM people  AS p
INNER JOIN payments AS pay
ON pay.CUSTOMER = p.ID 
WHERE p.ID = 7 AND pay.VOIDED = 0 
GROUP BY p.name

Then run

 SELECT
        p.name,
        COUNT(*) 
   FROM
        people as p
    INNER JOIN
        allorders as a ON a.CUSTOMER = p.ID
    INNER JOIN
        payments as pay ON pay.CUSTOMER = p.ID
    WHERE
        p.ID = 7 AND a.VOIDED = 0 AND pay.VOIDED = 0
   GROUP BY p.name

Last run the following

SHOW CREATE TABLE people;
SHOW CREATE TABLE payments;
SHOW CREATE TABLE allorders;

The problem is that you don't have the correct understanding of your data. You need to give us a bit more info about the data and the relationships, and the output I've described here should help. Mine is not an answer. But if you run these queries and paste the output of them, you should be able to get an answer, either from me or someone else.

Based on the discussion and edits above, please try:

SELECT
    p.name,
    SUM(o.TOTAL),
    SUM(o.DISCOUNT),
    SUM(o.SERVICECHARGE),
    MAX(pay.amt) 
FROM
    people as p
INNER JOIN
    allorders AS o ON o.CUSTOMER = p.ID
INNER JOIN (SELECT customer, 
            SUM(amount) as amt 
            FROM payments 
            WHERE voided = 0 AND customer = 7 
            GROUP BY customer) AS pay
     ON pay.customer = p.id
WHERE
    p.ID = 7 AND o.VOIDED = 0 
GROUP BY p.name

You could also do a subquery in your SELECT statement, but it's pretty obnoxious imo. You could also do min(pay.amt) or avg or even just leave the aggregate out altogether. The above should work... even though there are cleaner ways. I'm providing this answer so you can reason about why you were getting the unexpected result... actually optimizing your query is a different question that you can dive into later, once you've had a chance to look over this

Evan Volgas
  • 2,900
  • 3
  • 19
  • 30
  • Well, person has made 5 orders but only 2 payments. How is that bad? The 5 orders are in the order table, linking to the people table, the 2 payments are in the payments table, again linking to the people table. While I appreciate the fact that you're telling me what I shouldn't do, I have a hard time deduction from that what I should do instead :/ – nickdnk Dec 23 '14 at 20:27
  • @nickdnk Assuming that payments are made to pay for orders, your `payments` table should have an `orderID` column on it. Use that column to join `payments` to `allorders` instead of joining both to `people` – Patrick Q Dec 23 '14 at 20:37
  • Look into Cartesian joins. This might help: http://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135 – Evan Volgas Dec 23 '14 at 20:38
  • Payments can be made by a customer to a table and by the same customer to an order instead. The payment and order table tell something about WHO made the payment or order, not what they cover. The system is a little odd this way, but it must be able to handle it, and it does work. – nickdnk Dec 23 '14 at 20:40
  • @evanv Is the solution you posted considered bad practice, or is this the way to easily get the result I asked for? I could also just do two queries. This is supposed to run in a for loop and print total paid, amount ordered, service charged and discount given for any given person. – nickdnk Dec 23 '14 at 21:01
  • It's not a clear cut bad practice. It is a straightforward way to get the result you want. It might not be optimal for a number of reasons (code maintainability, performance, etc). But that's a bigger topic, and you would probably want to beef up your DB knowledge a bit before even trying to tackle it. Optimization is a complex topic. My note was more to SO users who are very skilled in databases and might rightly object that that code isn't necessarily optimal than it was to suggest that it for sure is not. I can't know for sure either way, based on what you provided. That was my point. – Evan Volgas Dec 23 '14 at 21:20
  • And I do think there are cleaner ways... but that would also depend on a little bit more DB experience and knowledge than your question/comment seem to indicate. So I'd consider the above answer "good enough" although probably something you could improve if you really wanted to dig into it. – Evan Volgas Dec 23 '14 at 21:22