I have the following table:
CREATE TABLE tbl (
id int NOT NULL
, date date NOT NULL
, cid int NOT NULL
, birth_place text NOT NULL
, location text NOT NULL
);
INSERT INTO tbl VALUES
(1 , '2022-01-01', 1, 'France' , 'Germany')
, (2 , '2022-01-30', 1, 'France' , 'France')
, (3 , '2022-01-25', 2, 'Spain' , 'Spain')
, (4 , '2022-01-12', 3, 'France' , 'France')
, (5 , '2022-02-01', 4, 'England', 'Italy')
, (6 , '2022-02-12', 1, 'France' , 'France')
, (7 , '2022-03-05', 5, 'Spain' , 'England')
, (8 , '2022-03-08', 2, 'Spain' , 'Spain')
, (9 , '2022-03-15', 2, 'Spain' , 'Spain')
, (10, '2022-03-30', 5, 'Spain' , 'Italy')
, (11, '2022-03-22', 4, 'England', 'England')
, (12, '2022-03-22', 3, 'France' , 'England');
I need to count distinct customers (= cid
) per month and location - with a special twist:
If a customer they went back to their birth place (location = birth_place
) during any given month, give priority to that location. Else pick one location per month and customer arbitrarily.
My desired output:
date location count
2022-01-01 France 2
2022-01-01 Spain 1
2022-02-01 Italy 1
2022-02-01 France 1
2022-03-01 Spain 1
2022-03-01 England 3
cid
1 in 2022-01-01 had a location = birth_place
, and no other customer had Germany as location in that time period, hence there is no Germany in my desired output location.
This is my current query:
with
t as (
select id, date_trunc('month', date)::date AS date, cid, birth_place, location
from tbl),
t1 as (
select date, cid, location
from t
where birth_place = location),
t2 as (
select date, cid, location, row_number() over (partition by date, cid order by date) as row
from t
where birth_place <> location),
t3 as (
select t.*,
case
when t1.location is not null then t1.location
else t2.location
end as new_loc
from t
left join t1
on t.cid = t1.cid and t.date = t1.date
left join t2
on t.cid = t2.cid and t.date = t2.date and t2.row = 1)
select date, new_loc, count(distinct cid)
from t3
group by 1, 2
order by 1, 2
It works, but it seems inefficient for 100 million rows.
I am looking for a more efficient way.