I'm trying to find the # active users over time on a daily basis.
A user is active when he has made more than 10 requests per week for 4 consecutive weeks.
ie. On Oct 31, 2014, a user is active if he has made more than 10 requests in total per week between:
- Oct 24-Oct 30, 2014 AND
- Oct 17-Oct 23, 2014 AND
- Oct 10-Oct 16, 2014 AND
- Oct 3-Oct 9, 2014
I have a table of requests
:
CREATE TABLE requests (
id text PRIMARY KEY, -- id of the request
amount bigint, -- sum of requests made by accounts_id to recipient_id,
-- aggregated on a daily basis based on "date"
accounts_id text, -- id of the user
recipient_id text, -- id of the recipient
date timestamp -- date that the request was made in YYYY-MM-DD
);
Sample values:
INSERT INTO requests2
VALUES
('1', 19, 'a1', 'b1', '2014-10-05 00:00:00'),
('2', 19, 'a2', 'b2', '2014-10-06 00:00:00'),
('3', 85, 'a3', 'b3', '2014-10-07 00:00:00'),
('4', 11, 'a1', 'b4', '2014-10-13 00:00:00'),
('5', 2, 'a2', 'b5', '2014-10-14 00:00:00'),
('6', 50, 'a3', 'b5', '2014-10-15 00:00:00'),
('7', 787323, 'a1', 'b6', '2014-10-17 00:00:00'),
('8', 33, 'a2', 'b8', '2014-10-18 00:00:00'),
('9', 14, 'a3', 'b9', '2014-10-19 00:00:00'),
('10', 11, 'a4', 'b10', '2014-10-19 00:00:00'),
('11', 1628, 'a1', 'b11', '2014-10-25 00:00:00'),
('13', 101, 'a2', 'b11', '2014-10-25 00:00:00');
Example output:
Date | # Active users
-----------+---------------
10-01-2014 | 600
10-02-2014 | 703
10-03-2014 | 891
Here's what I tried to do to find the number of active users for a certain date (e.g. 10-01-2014):
SELECT count(*)
FROM
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '2 weeks' AND '2014-10-01'::date - interval '1 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_1
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '3 weeks' AND '2014-10-01'::date - interval '2 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_2 ON week_1.accounts_id = week_2.accounts_id
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '4 weeks' AND '2014-10-01'::date - interval '3 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_3 ON week_2.accounts_id = week_3.accounts_id
JOIN
(SELECT accounts_id
FROM requests
WHERE "date" BETWEEN '2014-10-01'::date - interval '5 weeks' AND '2014-10-01'::date - interval '4 week'
GROUP BY accounts_id HAVING sum(amount) > 10) week_4 ON week_3.accounts_id = week_4.accounts_id
Since this is just the query to get the number for 1 day, I need to get this number on a daily basis over time. I think the idea is to do a join to get the date so I tried to do something like this:
SELECT week_1."Date_series",
count(*)
FROM
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '2 weeks' AND requests.date::date - interval '1 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_1
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '3 weeks' AND requests.date::date - interval '2 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_2 ON week_1.accounts_id = week_2.accounts_id
AND week_1."Date_series" = week_2."Date_series"
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '4 weeks' AND requests.date::date - interval '3 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_3 ON week_2.accounts_id = week_3.accounts_id
AND week_2."Date_series" = week_3."Date_series"
JOIN
(SELECT to_char(DAY::date, 'YYYY-MM-DD') AS "Date_series",
accounts_id
FROM generate_series('2014-10-01'::date, CURRENT_DATE, '1 day') DAY, requests
WHERE to_char(DAY::date, 'YYYY-MM-DD')::date BETWEEN requests.date::date - interval '5 weeks' AND requests.date::date - interval '4 week'
GROUP BY "Date_series",
accounts_id HAVING sum(amount) > 10) week_4 ON week_3.accounts_id = week_4.accounts_id
AND week_3."Date_series" = week_4."Date_series"
GROUP BY week_1."Date_series"
However, I think I'm not getting the right answer and I'm not sure why. Any tips/ guidance/ pointers is much appreciated! :) :)
PS. I'm using Postgres 9.3