I have a table with transactions between multiple parties and want to query it to create visualization and perform basic accounting. Table is like this:
+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
| id |transaction| amount | logged_at | buyer | seller | b_pos | s_pos |
+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
| 1 | 1 | 125000 | 2017-05-28 21:54:53.069000 | store2 | store1 | 4 | 5 |
| 2 | 1 | 109375 | 2017-05-28 21:54:53.069000 | store3 | store2 | 3 | 4 |
| 3 | 1 | 75000 | 2017-05-28 21:54:53.069000 | store4 | store3 | 2 | 3 |
| 4 | 1 | 100000 | 2017-05-28 21:54:53.069000 | store5 | store4 | 1 | 2 |
| 5 | 2 | 50000 | 2017-05-28 21:55:53.069000 | store5 | store3 | 1 | 2 |
So b_pos and s_pos is a position of a store in chain of transactions between those. So chain looks like store1 -> store2 -> store3 and so on.
So I am trying to do several things with my SQL.
- Create all distinct path between parties for visualization
- Calculate total amount of goods being sold between parties in those paths
- Total number of transactions was performed in these distinct paths.
Here's my SQL. The only problem is this query is getting pretty slow with table over 1mil records (30 sec) and my table could be as big as 700mil records. How should I approach this problem?
I can restrict queries to certain time intervals but it needs to be reasonably fast.
SELECT seller, CONCAT(seller, s_pos - offset) seller_id, buyer,
CONCAT(buyer, b_pos - offset) buyer_id, SUM(amount), cnt as transactions, amount/cnt as ROI
FROM transaction_table
JOIN (SELECT DISTINCT transaction,
CASE
WHEN seller = 'store3' THEN s_pos
WHEN buyer = 'store3' THEN b_pos
END AS offset
FROM
transaction_table
WHERE buyer = 'store3' OR seller = 'store3'
AND logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00'
) ck_offset
ON transaction_table.transaction = ck_offset.transaction
JOIN
(SELECT transaction, count(transaction) as cnt from (select * from transaction_table
WHERE buyer = 'store3' OR seller = 'store3'
AND logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00' group by transaction, logged_at) AS dist_chainkeys
group BY transaction) key_counts
ON key_counts.transaction = ck_offset.transaction
WHERE logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00'
GROUP BY seller, seller_id, buyer, buyer_id;