0

I have a table where I store timeseries data:

customer_id transaction_type transaction_date transaction_value
1 buy 2022-12-04 100.0
1 sell 2022-12-04 80.0
2 buy 2022-12-04 120.0
2 sell 2022-12-03 120.0
1 buy 2022-12-02 90.0
1 sell 2022-12-02 70.0
2 buy 2022-12-01 110.0
2 sell 2022-12-01 110.0

Number of customers and transaction types is not limited. Currently there are over 10,000 customers and over 600 transaction types. Dates of transactions ~between customers can be unique and~ will not always align based on any criteria among a customer or transaction type (that's why I've tried using LATERAL JOIN — you'll see it later).

I want to filter those records to get customers IDs with the values of the transaction where any arbitrary condition is met. Number of those conditions in a query is not restricted to two — can be anything. For example:

Give me all customers who have a buy with value > $90 and a sale with value > 100$ as their latest transactions

The final query should return these two rows:

customer_id transaction_type transaction_date transaction_value
2 buy 2022-12-04 120$
2 sell 2022-12-03 120$

The closest I've came to what I need was by creating a materialized view cross-joining customer IDs and transaction_types:

customer_id transaction_type
1 buy
1 sell
2 buy
2 sell

And then running a LATERAL JOIN between table with transactions and customer_transactions materialized view:

SELECT *
  FROM customer_transactions
  JOIN LATERAL (
    SELECT *
      FROM transactions
     WHERE (transactions.customer_id = customer_transactions.customer_id)
       AND (transactions.transaction_type = customer_transactions.transaction_type)
       AND transactions.transaction_date <= '2022-12-04' -- this can change for filtering records back in time
     ORDER BY transactions.transaction_date DESC
     LIMIT 1
  ) transactions ON TRUE
 WHERE customer_transactions.transaction_type = 'buy'
   AND customer_transactions.transaction_value > 90

It seems to be working when one condition is specified. But as soon as subsequential conditions are introduced that's where things start falling apart for me; changing condition to:

 WHERE (customer_transactions.transaction_type = 'buy'
   AND customer_transactions.transaction_value > 90)
   AND (customer_transactions.transaction_type = 'sell'
   AND customer_transactions.transaction_value > 100)

is obviously not going to work as there is no row that satisfies both of these conditions.

Is it possible to achieve this using the aproach I took? If so what am I missing? Or maybe there is another way to solve that would be more appropriate?

Ruslan
  • 1,208
  • 3
  • 17
  • 28
  • Does "have a `buy` and a `sell` transaction as their last two transactions" imply a specific order of these two transactions? – SebDieBln Dec 04 '22 at 20:05
  • "Dates of transactions between customers can be unique" doesn't make sense to me. Do you mean "the combination of `customer` and `date` is guaranteed to be unique"? – SebDieBln Dec 04 '22 at 20:09
  • 1
    @SebDieBln no they don't. Also clarified uniqueness of date — what I meant is that is there is a buy transaction on a date it does not mean there will sell transaction on the same date. Or if there is a transaction for one customer it does not imply there will be a transaction on the same date for another – Ruslan Dec 05 '22 at 19:41

2 Answers2

0

You could use a CTE with row_number and chech out the last transactios

WITH CTE as (SELECT
"customer_id", "transaction_type", "transaction_date",
  "transaction_value",
ROW_NUMBER() OVER(PARTITION BY "customer_id", "transaction_type" ORDER BY  "transaction_date" DESC) rn
FROM tab1)
SELECT "customer_id", "transaction_type", "transaction_date",
  "transaction_value" FROM CTE
  WHERE rn = 1 
  AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
ELSE FALSE END 
AND (SELECT COUNT(*) FROM CTE c1 
  WHERE c1."customer_id"= CTE."customer_id" and rn = 1
    AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
ELSE FALSE END ) = 2
customer_id transaction_type transaction_date transaction_value
2 buy 2022-12-04 120.0
2 sell 2022-12-03 120.0
SELECT 2

fiddle

nbk
  • 45,398
  • 8
  • 30
  • 47
0

Use distinct on with custom order to select all the latest transactions per customer according to your several criteria (hence the OR) - latest CTE, then count the number of result records per user using count as a window function - latest_with_count CTE - and finally pick these that have a count equal to the number of criteria, i.e. all the criteria are honoured.
This may be a bit verbose and abstract template but hopefully would help with the generic problem. The idea would work for any number of conditions.

with t as
(
 /*
  your query here with several conditions in DISJUNCTION (OR) here, i.e.
  WHERE (customer_transactions.transaction_type = 'buy'  AND customer_transactions.transaction_value > 90)
     OR (customer_transactions.transaction_type = 'sell' AND customer_transactions.transaction_value > 100)
 */
),
latest as 
(
 select distinct on (customer_id, transaction_type) *
 from t
 -- pick the latest per customer & type
 order by customer_id, transaction_type, transaction_date desc
),
latest_with_count as
(
 select *, count(*) over (partition by customer_id) cnt
 from latest
)
select * 
from latest_with_count
where cnt = 2 -- the number of criteria
Stefanov.sm
  • 11,215
  • 2
  • 21
  • 21