0

enter image description here

Hi! I'm new to SQL I'm having trouble correctly using the avg and sum function. I'm working with the database above. I need to find the average profit for each movie star but only output that average only if the sum of their profits is > 200.

 SELECT MovieStar.sname, avg(profit) From MovieStar, Movie
 GROUP BY sname
 HAVING sum(Movie.profit) > 200

I'm using SQL Fiddle to try and figure this out, but it seems to be returning the avg of the WHOLE profit column and not each actor but I'm not sure what I could be doing wrong. How can I approach this so I can get the avg of each actor and not the avg of the whole profit column? Here is the SQL Fiddle I made. Thanks in advance!

helloworld
  • 399
  • 3
  • 9
  • 21

1 Answers1

3

Looks like the query is doing a CROSS JOIN operation, a cartesian product, matching every movie to each moviestar. Seems like we would only want to match a moviestar to particular movies, probably using (including) the starsin table to get the matches.

I recommend you ditch the old-school comma operator the join operation. Use the newer JOIN keyword instead. And put the join predicates in the appropriate ON clause instead of the WHERE clause.

Also, best practice is to qualify all column references; even when not required to eliminate ambiguity, it prevents the query from breaking when new columns are added, and it aids the future reader... the poor soul who has to go look at the table definitions to figure out which columms come from which tables.

I think you want something like this:

 SELECT ms.sname
 --   , ms.sno
      , AVG(m.profit)  AS  avg_profit
 --   , SUM(m.profit)  AS  tot_profit 
   FROM MovieStar ms
   JOIN StarsIn si 
      ON si.sno = ms.sno
    JOIN Movie m
      ON m.mno = si.mno
   GROUP BY ms.sno, ms.sname
  HAVING SUM(m.profit) > 200
   ORDER BY AVG(m.profit) DESC
spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • In fact, this is exactly why the implicit join is to be avoided as you can [mistakenly run a cross join](https://stackoverflow.com/a/317465/1422451) as OP did. – Parfait Oct 18 '17 at 00:17
  • Thank you to both of you! I didn't know cross join operations could happen like that, but that makes sense. Can I ask why you chose to use ON instead of WHERE in this situation? In my head they seem like the same thing, or at least closely related. – helloworld Oct 18 '17 at 00:31
  • I prefer to put the conditions for the row "matching" into the ON clause, and put other conditions into the WHERE clause. This isn't a requirement, MySQL doesn't care. This is just a style convention, a preference. I think it's easier on the reader when the query joins eight tables, to have the conditions in the ON clause, rather than making the reader sift through all of the conditions in the WHERE clause. Having the conditions in the ON clause also makes it easier when we do OUTER JOINs. – spencer7593 Oct 18 '17 at 00:52
  • When I do want a cartesian product, I omit the ON clause and include the CROSS keyword e.g. `SELECT ... FROM movies CROSS JOIN moviestar WHERE ...`. This isn't a requirement in MySQL, it's just a style convention, I'm alerting the reader that the omission of the matching conditions is purposeful, I'm intending a cartesian product. The CROSS keyword tells the reader that the absence of the matching conditions isn't an oversight, it's intentional. – spencer7593 Oct 18 '17 at 00:55