0

I have the following table :

year: 2014, company_id : 1, customers_nb : 3
year: 2014, company_id : 2, customers_nb : 0
year: 2012, company_id : 2, customers_nb : 8
year: 2011, company_id : 2, customers_nb : 4

I want to write a query whiwh retrieves the customers_nb for each company, at the last year when it had at more than 0 customers. This means that company_id 1 would have 3 customers, and company_id 2 would have 8 customers (because 2012 is the most recent year when the company had more than 0 customers, and in 2012 it had 8 customers).

Result :

year: 2014, company_id : 1, customers_nb : 3
year: 2012, company_id : 2, customers_nb : 8

I thought about recursive, group by having, didn't find anything which really works.

tom redfern
  • 30,562
  • 14
  • 91
  • 126
Rayjax
  • 7,494
  • 11
  • 56
  • 82
  • This is a classic [tag:greatest-n-per-group] problem, as evidenced in questions [like this](http://stackoverflow.com/questions/1313120/retrieving-the-last-record-in-each-group). In your case you just have to knock out zero-customer rows first. – Clockwork-Muse Jul 08 '14 at 08:36

2 Answers2

3

You only want years where there were customers. So WHERE customers_nb > 0.

You want one record per company. So GROUP BY company_id.

You want the most recent year. So MAX(year).

Only problem: You want the number of customers in the found year. To retrieve that, we will have to find the year first and then join the table again to get the additional data.

select mytable.*
from mytable
inner join
(
  select company_id, max(year) as year
  from mytable
  where customers_nb > 0
  group by company_id
) theyear on theyear.company_id = mytable.company_id and theyear.year = mytable.year;

EDIT (after acceptence of above answer):

Here is another way of writing the query: Give me all records where no later record exists.

select mytable.*
from mytable
where customers_nb > 0
and not exists
(
  select *
  from mytable later
  where later.customers_nb > 0
  and later.company_id = mytable.company_id and later.year > mytable.year
);

And here is the same with an outer join. The trick is to outer join later entries and then remove records where later entries could be found. This looks slow, because it can be assumed to create a big intermediate result (all later entries joined with each record in the table) that are not actually needed. However, MySQL is known for preferring joins over other operations. And with many comapnies and few years the intermediate result wouldn't be that big.

select mytable.*
from mytable
left join mytable later on later.company_id = mytable.company_id and later.year > mytable.year and later.customers_nb > 0
where mytable.customers_nb > 0
and later.company_id is null;
Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • this works. I accept because of the detailed answer. Only problem it takes easily one minute (got 7000 rows) to run. I'll probably edit my structure in order to have a flag on the rows which have the most recent customers number > 0, it is going to be faster. But this query works very well, will be useful for smaller tables or for people without time constraints – Rayjax Jul 08 '14 at 09:15
  • @Rayjax: 7000 rows is not much at all. (Though it is a bit surprising considering your data. So there are many companies in the table? It can't be that many years, can it?) Anyway, this query should be fast. You can speed it up though with an index on company_id plus year. – Thorsten Kettner Jul 08 '14 at 09:28
  • @Rayjax: The link Clockwork-Muse has given in the comments to your request is worth reading. It seems MySQL has some difficulties with this query structure (which is great with other dbms, such as SQL Server and Oracle). So it may be worth re-writing it. – Thorsten Kettner Jul 08 '14 at 09:34
  • @Rayjax: I've edited my answer to give you alternatives. – Thorsten Kettner Jul 08 '14 at 09:50
  • Yes there are a lot of companies (and not much years, it starts at 2000). There are also some other fields which I didn't mention here because they are not relevant in this case. I will try all of your queries and benchmark them – Rayjax Jul 08 '14 at 13:06
1

What about

select max(year), company_id, customers_nb, max(customers_nb)
from mytable
where customers_nb > 0
group by company_id
paubo147
  • 748
  • 4
  • 8
  • This query gives you the number of customers of a random year and the maximum number of customers. It doesn't determine the number of customers in the most recent year. – Thorsten Kettner Jul 08 '14 at 08:14