- I am looking to calculate the average difference in days between those three dates.*
And by average difference we mean "take the average of the absolute value of the difference between all dates". That's 12 + 16 + 4 / 3
or 10.6667.
We need all combinations of dates. For this we need a self-join with no repeats. That's accomplished by picking a field and using on
with a <
or >
.
select t1.date, t2.date
from tests as t1
join tests as t2 on t1.id = t2.id and t1.date < t2.date
where t1.id = 1;
2018-03-01|2018-03-13
2018-03-01|2018-03-17
2018-03-13|2018-03-17
Now that we have all combinations, we can take the difference. But not by simply subtracting the dates, SQLite doesn't support that. First, convert them to Julian Days.
sqlite> select julianday(t1.date), julianday(t2.date) from tests as t1 join tests as t2 on t1.id = t2.id and t1.date < t2.date where t1.id = 1;
2458178.5|2458190.5
2458178.5|2458194.5
2458190.5|2458194.5
Now that we have numbers we can take the absolute value of the difference and do an average.
select avg(abs(julianday(t1.date) - julianday(t2.date)))
from tests as t1
join tests as t2 on t1.id = t2.id and t1.date < t2.date
where t1.id = 1;
UPDATE
What I want to calculate is the avg difference in days between test to test, so: Diff between first to second is 12 days. Diff between second to third is 4 days. Then (12+4)/2=8 which should be the result.
For this twist on the problem you want to compare each row with the next one. You want a table like this:
2018-03-01|2018-03-13
2018-03-13|2018-03-17
Other databases have features like window
or lag
to accomplish this. SQLite doesn't have that. Again, we'll use a self-join, but we have to do it per row. This is a correlated subquery.
select t1.date as date, (
select t2.date
from tests t2
where t1.id = t2.id and t2.date > t1.date
order by t2.date
limit 1
) as next
from tests t1
where id = 1
and next is not null
The subquery-as-column finds the next date for each row.
This is a bit unwieldy, so let's turn it into a view. Then we can use it as a table. Just take out the where id = 1
so it's generally useful.
create view test_and_next as
select t1.id, t1.date as date, (
select t2.date
from tests t2
where t1.id = t2.id and t2.date > t1.date
order by t2.date
limit 1
) as next
from tests t1
where next is not null
Now we can treat test_and_next
as a table with the columns id
, date
, and next
. Then it's the same as before: turn them into Julian Days, subtract, and take the average.
select avg(julianday(next) - julianday(date))
from test_and_next
where id = 1;
Note that this will go sideways when you have two rows with the same date: there's no way for SQL to know which is the "next" one. For example, if there were two tests for ID 1 on "2018-03-13" they'll both choose "2018-03-17" as the "next" one.
2018-03-01|2018-03-13
2018-03-13|2018-03-17
2018-03-13|2018-03-17
I'm not sure how to fix this.