0

I have a table with transactions between multiple parties and want to query it to create visualization and perform basic accounting. Table is like this:

+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
| id |transaction| amount | logged_at                  | buyer       | seller     | b_pos | s_pos |
+----+-----------+--------+----------------------------+-------------+------------+-------+-------+
|  1 | 1         | 125000 | 2017-05-28 21:54:53.069000 | store2      | store1     |     4 |     5 |
|  2 | 1         | 109375 | 2017-05-28 21:54:53.069000 | store3      | store2     |     3 |     4 |
|  3 | 1         |  75000 | 2017-05-28 21:54:53.069000 | store4      | store3     |     2 |     3 |
|  4 | 1         | 100000 | 2017-05-28 21:54:53.069000 | store5      | store4     |     1 |     2 |
|  5 | 2         |  50000 | 2017-05-28 21:55:53.069000 | store5      | store3     |     1 |     2 |

So b_pos and s_pos is a position of a store in chain of transactions between those. So chain looks like store1 -> store2 -> store3 and so on.

So I am trying to do several things with my SQL.

  1. Create all distinct path between parties for visualization
  2. Calculate total amount of goods being sold between parties in those paths
  3. Total number of transactions was performed in these distinct paths.

Here's my SQL. The only problem is this query is getting pretty slow with table over 1mil records (30 sec) and my table could be as big as 700mil records. How should I approach this problem?

I can restrict queries to certain time intervals but it needs to be reasonably fast.

SELECT seller, CONCAT(seller, s_pos - offset) seller_id, buyer,
        CONCAT(buyer, b_pos - offset) buyer_id, SUM(amount), cnt as transactions, amount/cnt as ROI
FROM transaction_table
JOIN (SELECT DISTINCT transaction,
        CASE
            WHEN seller =  'store3' THEN s_pos
            WHEN buyer =  'store3' THEN b_pos
        END AS offset
        FROM
            transaction_table
        WHERE buyer =  'store3' OR seller =  'store3'
        AND logged_at >= '2014-06-23 17:34:20'
        AND logged_at <= '2018-06-23 17:34:00'
        ) ck_offset
    ON transaction_table.transaction = ck_offset.transaction
JOIN
        (SELECT transaction, count(transaction) as cnt from (select * from transaction_table
                             WHERE buyer =  'store3' OR seller =  'store3'
                             AND logged_at >= '2014-06-23 17:34:20'
                             AND logged_at <= '2018-06-23 17:34:00' group by transaction, logged_at) AS dist_chainkeys
                group BY transaction) key_counts
    ON key_counts.transaction = ck_offset.transaction
WHERE logged_at >= '2014-06-23 17:34:20'
AND logged_at <= '2018-06-23 17:34:00'
GROUP BY seller, seller_id, buyer, buyer_id;
Akim Akimov
  • 41
  • 1
  • 7

1 Answers1

0

Do your tables use indexes? Indexes are the way to boost the performance. Have a look to this:

What is an index in SQL?

jolumg
  • 714
  • 2
  • 15
  • 31