MySQL avg() and count() in one statement with group by

Question

today I'm fighting with MySQL: I've got two tables, that contain records like that (actually there are more columns, but I don't think it's relevant):

Table Metering:

id, value
1000, 0.117
1000, 0.689
1001, 0.050
...

Table Res (there is no more than one record per id in this table):

id, number_residents
1001, 2
...

I try to get results in the following format:

number_residents, avg, count(id)
2, 0.1234, 456
3, 0.5678, 567
...

In words: I try to find out the average of the value-fields with the same number_residents. The id-field is the connection between the two tables. The count(id)-column should show how many ids have been found with that number_residents. The query I could come up with was the following:

select number_residents,count(distinct Metering.id),avg(value)
  from Metering, Res
  where Metering.id = Res.id
  group by number_residents;

The results look like what I searched for but when I tried to validate them I became insecure. I tried it without the distinct at first but that leads to too high values in the count-column of the results. Is my statement right to get what I want? I thought it might have to to something with the order of execution like asked here, but I actually can't find any official documentation on that...

Thanks for helping!

Your query is quite well-defined, so the order of execution has no bearing on the results. You should learn proper `join` syntax. But, the issue is: What results do you want? Are there duplicates in `Res`? You need to edit your question with *expected* results as well as sample data. The statement "the average value of all ids" is ambiguous, because you have multiple rows with the same id. How should those be handled? — Gordon Linoff, Mar 15 '15 at 18:27
Ok, I indeed didn't mention that there are no duplicates in Res. I edited the question. And thanks for the hint on the ambiguous value of all ids. My head was kind of too full yesterday. Edited the question too! — caligula, Mar 16 '15 at 09:53
But for the _expected_ results I'm not sure what you mean... The only real result is the avg column and what I expect is something in the range of 0 to 5 but I don't see why that should be important. — caligula, Mar 16 '15 at 10:23

Bohemian · Answer 1 · 2015-03-16T14:04:28.657

Judging by the table names, Res is the "parent" table and Metering us the "child" table - that is there are 0-n meterings for each residence.

You have use "old school" joins (and I mean old - the join syntax has been around for 25 years now), which are inner joins, meaning residences without meterings won't participate in the results.

Use an outer join:

select
    number_residents,
    count(distinct r.id) residences_count,
    avg(value) average_value
from Res r
left join Metering m on m.id = r.id
group by number_residents

Although meterings.id = res.id, with a left join counting them may produce different results: I've changed the count to count residences, which for a left join means residences that don't have meterings still count.

Now, nulls (which are what you get from a left-joined table that doesn't have a matching row) don't participate in avg() - either for the numerator or denominator, if you want residences without any meterings to count when calcukating the average (as if they have a single zero metering for the purposes of dividing the total value), use this query:

select
    number_residents,
    count(distinct r.id) residences_count,
    sum(value) / count(r.id) average_value
from Res r
left join Metering m on m.id = r.id
group by number_residents

Because res.id is never null, count(r.id) counts the number of meterings plus 1 for every residence without any meterings.

MySQL avg() and count() in one statement with group by

1 Answers1