2

I would like to find out the average number of days between orders grouping by account_id in the database.

Let's say I have the following table named 'orders' with this data.

id  account_id  account_name    order_date
1   555         Acme Fireworks  2015-06-15
2   342         Kent Brewery    2015-09-12
3   555         Acme Fireworks  2015-09-15
4   342         Kent Brewery    2015-10-12
5   342         Kent Brewery    2015-11-12
6   342         Kent Brewery    2015-12-12
7   555         Acme Fireworks  2015-12-15
8   900         Plastic Inc.    2015-12-20

I would like a query to produce the following results

account_id  account_name    average_days_between_orders
342         Kent Brewery    30.333
555         Acme Fireworks  91.5
900         Plastic Inc.    (unsure of what value would go here since there's 1 order only)

I checked the following questions to get an idea, but still couldn't figure out the problem:

Thanks!

Community
  • 1
  • 1

3 Answers3

0

You need a query that produces the difference between the previous purchase for a given (null if there is no previous purchase) and take the average of these values.

I would self-join the above table to get for each order the maximum order date of any previous order in a subquery. In the avg() function calculate the difference between the calculated date and the current order date:

SELECT o3.account_id, o3.account_name, avg(diff) as average_days_between_orders
FROM
    (select o1.id,
            o1.account_id,
            o1.account_name,
            datediff(o1.order_date, max(o2.order_date)) as diff
     from orders o1
     left join orders o2 on o1.account_id=o2.account_id and o1.id>o2.id
     group by o1.id, o1.account_id, o1.account_name, o1.order_date) o3
GROUP BY o3.account_id, o3.account_name

As an alternative to joins, you can use a user defined variable in the subquery or a correlated subquery in the select list to calculate the differences. You can check mysql running total solutions to get a hang of this solution, such as this SO topic. Specifically, check out the solution provided by Andomar.

If your orders table is huge, then the alternative aprroaches described in that topic may be better from a performance point of view.

Community
  • 1
  • 1
Shadow
  • 33,525
  • 10
  • 51
  • 64
  • Close. When I run that query, I get the following: Kent Brewery: 22.75, Acme Fireworks: 61, Plastic Inc.: 0. I did a group_concat on the o3.diff, and it displays: Kent Brewery: (30,30,0,31), Acme Fireworks: (0,91,92), Plastic Inc.: (0). Somehow an extra difference of 0 is thrown in to the mix which is skewing the averages. – superherogeek Apr 19 '16 at 17:57
  • In this case remove the conversion of null to 0 in the subquery. avg() will return null for those customers, who had only 1 order. See updated answer. – Shadow Apr 19 '16 at 18:04
  • Worked perfectly! The only change I had to make is the function "date_diff" in MySQL is "datediff" – superherogeek Apr 19 '16 at 18:08
0

Note: Please test it carefully and use it as you wish. I couldn't find an easy query for it. I don't guarantee to work for all cases :) If you just want the answer, the complete query is shown in the end.

The goal is that I'll try to get a table with start and end dates in one row, and then I'll simply calculate average difference between two dates. Something like this.

id | account_id | account_name    | start_date | end_date
------------------------------------------------------------
1  | 342        | Kent Brewery    | 2015-09-12 | 2015-10-12
2  | 342        | Kent Brewery    | 2015-10-12 | 2015-11-12
3  | 342        | Kent Brewery    | 2015-11-12 | 2015-12-12
4  | 555        | Acme Fireworks  | 2015-06-15 | 2015-09-15
5  | 555        | Acme Fireworks  | 2015-09-15 | 2015-12-15

I'll create few temporary tables to make it a bit more clear. First query for start_date:

QUERY:

create temporary table uniq_start_dates
select (@sid := @sid + 1) id, tmp_uniq_start_dates.*
    from
        (select distinct o1.account_id, o1.account_name, o1.order_date start_date 
         from orders o1 
         join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date 
         order by o1.account_id, o1.order_date) tmp_uniq_start_dates 
         join (select @sid := 0) AS sid_generator

OUTPUT: temporary table - uniq_start_dates

id | account_id | account_name    | start_date
-----------------------------------------------
1  | 342        | Kent Brewery    | 2015-09-12 
2  | 342        | Kent Brewery    | 2015-10-12 
3  | 342        | Kent Brewery    | 2015-11-12 
4  | 555        | Acme Fireworks  | 2015-06-15 
5  | 555        | Acme Fireworks  | 2015-09-15 

Do the same thing for end_date:

QUERY:

create temporary table uniq_end_dates
select (@eid := @eid + 1) id, tmp_uniq_end_dates.*
from
    (select distinct o2.account_id, o2.account_name, o2.order_date end_date 
     from orders o1 
     join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date 
     order by o2.account_id, o2.order_date) tmp_uniq_end_dates 
     join (select @eid := 0) AS eid_generator

OUTPUT: temporary table - uniq_end_dates

id | account_id | account_name    | end_date
-----------------------------------------------
1  | 342        | Kent Brewery    | 2015-10-12
2  | 342        | Kent Brewery    | 2015-11-12
3  | 342        | Kent Brewery    | 2015-12-12
4  | 555        | Acme Fireworks  | 2015-09-15
5  | 555        | Acme Fireworks  | 2015-12-15

If you notice, I created new auto id for each view so that I can join them back to one table (like the very first table). Let's join uniq_start_dates and uniq_end_dates.

QUERY:

create temporary table uniq_start_end_dates
select uniq_start_dates.*, uniq_end_dates.end_date
from uniq_start_dates
join uniq_end_dates using (id)

OUTPUT: temporary table - uniq_start_end_dates

(the same one as the first table)

Now it's an easy part. Just aggregate and get average date time difference.

QUERY: 

select account_id, account_name, avg(timestampdiff(day, start_date, end_date)) average_days 
from uniq_start_end_dates
group by account_id, account_name

OUTPUT:

account_id | account_name   | average_days
--------------------------------------------
342        | Kent Brewery   | 30.3333
555        | Acme Fireworks | 91.5000

If you may notice, Plastic Inc. is not in the result. If you care about "null" average_days. Here it is:

QUERY:

select all_accounts.account_id, all_accounts.account_name, accounts_with_average_days.average_days
from 
    (select distinct account_id, account_name from orders) all_accounts
left join
    (select account_id, account_name, avg(timestampdiff(day, start_date, end_date)) average_days
    from uniq_start_end_dates
    group by account_id, account_name) accounts_with_average_days
using (account_id, account_name)

OUTPUT:

account_id | account_name   | average_days
--------------------------------------------
342        | Kent Brewery   | 30.3333
555        | Acme Fireworks | 91.5000
900        | Plastic Inc.   | null

Here is a complete messy query:

select all_accounts.account_id, all_accounts.account_name, accounts_with_average_days.average_days
from 
    (select distinct account_id, account_name from orders) all_accounts
left join
    (select uniq_start_dates.account_id, uniq_start_dates.account_name, avg(timestampdiff(day, start_date, end_date)) average_days 
    from
        (select (@sid := @sid + 1) id, tmp_uniq_start_dates.*
        from
            (select distinct o1.account_id, o1.account_name, o1.order_date start_date from orders o1
            join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date order by o1.account_id, o1.order_date) tmp_uniq_start_dates join (select @sid := 0) AS sid_generator
            ) uniq_start_dates 
        join
        (select (@eid := @eid + 1) id, tmp_uniq_end_dates.*
        from
            (select distinct o2.account_id, o2.account_name, o2.order_date end_date from orders o1
            join orders o2 on o1.account_id=o2.account_id and o1.order_date < o2.order_date order by o2.account_id, o2.order_date) tmp_uniq_end_dates join (select @eid := 0) AS eid_generator
            ) uniq_end_dates 
        using (id)
    group by uniq_start_dates.account_id, uniq_start_dates.account_name) accounts_with_average_days
using (account_id, account_name)
Tin
  • 794
  • 5
  • 10
  • This seems to be way too complicated. If you are introducing user variables, then you could actually calculate the difference between the current record's order date and any previous ones in one go instead of this. – Shadow Apr 19 '16 at 17:49
0

You can take the date difference between the latest and the oldest order and divide it by the number of orders - 1. What you do with single orders would be up to you.

SELECT account_id, account_name, DATEDIFF(MAX(order_date), MIN(order_date)) / NULLIF(COUNT(*) - 1, 0) AS average_days_between_orders FROM table GROUP BY account_id;