6

The problem:

We're getting stock prices and trades from a provider, and to speed things up we cache the trades as they come in (1 trade per second per stock is not a lot). We've got around 2,000 stocks, so technically, we're expecting as much as 120,000 trades per minute (2,000 * 60). Now, these prices are realtime, but to avoid paying licensing fees to show these data to the customer we need to show the prices delayed with 15 minutes. (We need the realtime prices internally, which is why we've bought and pay for them (they are NOT cheap!))

I feel like I've tried everything, and I've run into an uncountable number of problems.

Things I've tried:

1:

Run a cronjob every 15 seconds that runs a query that checks what the trade for the stock, more than 15 minutes ago, had for an ID (for joins):

SELECT
    MAX(`time`) as `max_time`,
    `stock_id`
FROM
    `stocks_trades`
WHERE
    `time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND
    `time` > '0000-00-00 00:00:00'
GROUP BY
    `stock_id`

This works very fast - 1.8 seconds with ~2,000,000 rows, but the following is very slow:

SELECT
    st.id,
    st.stock_id
FROM
    (
        SELECT
            MAX(`time`) as `max_time`,
            `stock_id`
        FROM
            `stocks_trades`
        WHERE
            `time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
        AND
            `time` > '0000-00-00 00:00:00'
        GROUP BY
            `stock_id`
    ) as `tmp`
INNER JOIN
    `stocks_trades` as `st`
ON
    (tmp.max_time = st.time AND tmp.stock_id = st.stock_id)
GROUP BY
    `stock_id`

..that takes ~180-200 seconds, which is WAY too slow. There's an index on both time and stock_id (indiviudally).

2:

Switch between InnoDB/MyISAM. I'd think I would need InnoDB (we're inserting A LOT of rows from multiple threads, we don't want to block between each insert) - InnoDB seems faster at inserting, but WAY slower at reading (we require both, obviously).

3:

Optimize tables every day. Still slow.

What I think might help:

  1. Using ints instead of DateTime. Perhaps (since the markets are open from 9-22) keep a custom int time, which would be "seconds since 9 o'clock this morning" and use the same method as above (it seems to make some difference, albeit not a lot)
  2. Use MEMORY instead of InnoDB - probably not the best idea with ~18,000,000 rows per 15 minutes, even though we have plenty of memory
  3. Save price/stockID/time in memory in our application receiving the prices (I don't see how this would be any different than using MEMORY, except my code probably will be worse than MySQL's own code)
  4. Keep deleting trades older than 15 minutes in hopes that it'll speed up the queries
  5. Some magic query that I just haven't thought of that uses the indexes perfectly and does magical things
  6. Give up and kill one self after spending ~12 hours on trying to wrap my head around this and different solutions
Community
  • 1
  • 1
h2ooooooo
  • 39,111
  • 8
  • 68
  • 102
  • 4
    What's the reason for the outer `GROUP BY stock_id`? You have no corresponding aggregate functions in the outer query. – Michael Berkowski Dec 11 '13 at 21:01
  • And have you attempted a compound index on `(stock_id, time)` That may be the simple solution. `ALTER TABLE stocks_trades ADD INDEX idx_stock_id_time (stock_id, time)` – Michael Berkowski Dec 11 '13 at 21:03
  • @MichaelBerkowski The reason for the outer group is that a stock might have more trades in a single second. The composite index worked WONDERS by the way, and made the whole thing take 0.03 seconds instead of 180 seconds. Please add this as a response and I'll gladly accept it. Thank you so much for the help. – h2ooooooo Dec 12 '13 at 20:34
  • Ok, I converted it to an answer. – Michael Berkowski Dec 12 '13 at 20:38
  • 1
    The `GROUP BY` is a little misleading then. Either it should become a `DISTINCT` instead (I suspect not, without sample data) or just beware MySQL's behavior having only one column from the `SELECT` in the `GROUP BY`. The value it returns for `id` is going to be indeterminate and that might be fine for your needs, but will cause problems if you ever port it to a different RDBMS where that's a compile error. – Michael Berkowski Dec 12 '13 at 20:45

3 Answers3

1

Since your are joining against your subquery on two columns (stock_id, time), MySQL ought to be able to make use of a compound index across both of them, while it cannot make use of either of the individual column indices you already have.

ALTER TABLE `stocks_trades` ADD INDEX `idx_stock_id_time` (`stock_id`, `time`)
Michael Berkowski
  • 267,341
  • 46
  • 444
  • 390
0

Assuming your have an auto incrementing id as the primary key on stock_trades (call it stock_trade_id), you could select the max('stock_trade_id') as 'last_id' on the inner query and then do an inner join on the 'last_id' = 'stock_trade_id' so you will be joining on your PK and have no date compares on your main join.

SELECT
st.id,
st.stock_id
FROM
(
    SELECT
        MAX(`stock_trade_id`) as `last_id`,
        `stock_id`
    FROM
        `stocks_trades`
    WHERE
        `time` <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
    AND
        `time` > '0000-00-00 00:00:00'
    GROUP BY
        `stock_id`
) as `tmp`
INNER JOIN
    `stocks_trades` as `st`
ON
   (tmp.last_id = st.stock_trade_id)
GROUP BY
   `stock_id`
smurtagh
  • 529
  • 6
  • 17
0

What happens if you run something like this? Try to change it to include the proper column name for the price if needed:

SELECT st.id, st.stock_id
FROM stock_trades as st
WHERE   time <= DATE_SUB(NOW(), INTERVAL 15 MINUTE)
AND time > DATE_SUB(NOW(), INTERVAL 45 MINUTE)
AND not exists (select 1 from stock_trades as st2 where st2.time <= DATE_SUB(NOW(), INTERVAL 15 MINUTE) and st2.stock_id = st.stock_id and st2.time > st.time)

hope it helps!

Sergio Ayestarán
  • 5,590
  • 4
  • 38
  • 62