1

My target: I have a list of stock_ids and want to get the last bids (its sorted by date) only one per stock_id.

For the picture, it means i want:

stock_id bid
3 663.91953
1 46.44281
2 9.02798

One problem is we have stocks like gazproms which are suspended, so one of the last quotes can be 2021-06-06 for example.

Take a where on quote_day = DATE(NOW()) would not work in this case.

I also need the same for the first lower date, which is not in the first query, this can be done over a second query.

My current solution with using PHP. This is working but the performance is not perfect like for 100 stocks it's take 5 seconds.

I'm able to use Redis, it would be also a option to save the bid somewhere.

Current:

select `quote_date`, 'stocks' as `type`, `bid`, `stock_id` as id
from ( 
  select t.*, row_number()
    over(partition by stock_id order by `quote_date` desc) as rn 
  from end_day_quotes_AVG t 
  where quote_date <= DATE({$date}) 
    AND stock_id in ({$val})
    and currency_id = {$c_id} 
) x where rn = 1

the day before:

select `quote_date`, 'stocks' as `type`, `bid`, `stock_id` as id
from ( 
  select t.*, row_number()
    over(partition by stock_id order by `quote_date` desc) as rn 
  from end_day_quotes_AVG t 
  where quote_date < DATE({$date})
    AND stock_id in ({$val})
    and currency_id = {$c_id}
) x where rn = 1 

Stock_id, quote_date, and currency_id are unique.

The Table I want data using server: 10.9.4-MariaDB-1:10.9.4

enter image description here

edit:

explained query:

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   PRIMARY     <derived2>  ALL     NULL    NULL    NULL    NULL    220896  Using where
2   DERIVED     t   ALL     stock_id,quote_date     NULL    NULL    NULL    2173105     Using where; Using temporary

create Table:

CREATE TABLE `end_day_quotes_AVG` (
  `id` int(11) NOT NULL,
  `quote_date` date NOT NULL,
  `bid` decimal(15,5) NOT NULL,
  `stock_id` int(11) DEFAULT NULL,
  `etf_id` int(11) DEFAULT NULL,
  `crypto_id` int(11) DEFAULT NULL,
  `certificate_id` int(11) DEFAULT NULL,
  `currency_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

INSERT INTO `end_day_quotes_AVG` (`id`, `quote_date`, `bid`, `stock_id`, `etf_id`, `crypto_id`, `certificate_id`, `currency_id`) VALUES
(10537515, '2023-01-02', '16.48286', 40581, NULL, NULL, NULL, 2),
(10537514, '2023-01-02', '3.66786', 40569, NULL, NULL, NULL, 2),
(10537513, '2023-01-02', '9.38013', 40400, NULL, NULL, NULL, 2),
(10537512, '2023-01-02', '8.54444', 40396, NULL, NULL, NULL, 2),



ALTER TABLE `end_day_quotes_AVG`
  ADD PRIMARY KEY (`id`),
  ADD KEY `stock_id` (`stock_id`,`currency_id`),
  ADD KEY `etf_id` (`etf_id`,`currency_id`),
  ADD KEY `crypto_id` (`crypto_id`,`currency_id`),
  ADD KEY `certificate_id` (`certificate_id`,`currency_id`),
  ADD KEY `quote_date` (`quote_date`);


ALTER TABLE `end_day_quotes_AVG`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=10570526;

A generated filled query:

select `quote_date`, 'stocks' as `type`, `bid`, `stock_id` as id from 
( select t.*, row_number() over(partition by stock_id order by `quote_date` desc) as rn 
from end_day_quotes_AVG t where quote_date <= DATE('2023-01-02') AND stock_id in (2,23,19,41,40,26,9,43,22,
44,28,32,30,34,20,10,13,17,27,35,8,29,39,16,33,5,36589,25,18,6,38,37,3,45,7,21,46,15,4,24,31,36,38423,40313,
22561,36787,35770,36600,35766,42,22567,40581,40569,29528,22896,24760,40369,40396,40400,40374,36799,1,27863,
29659,40367,27821,24912,36654,21125,22569,22201,
23133,40373,36697,36718,26340,36653,47,34019,36847,36694) and currency_id = 2 ) x where rn = 1; 
StefanBD
  • 334
  • 3
  • 14
  • Please be clear about which brand of database software you're using. MariaDB is not MySQL. MariaDB started as a fork of MySQL in 2010, but both products have changed since then, so they are no longer compatible. An answer for one of these products is not necessarily correct for the other. I've made an edit to make it clear that you're using MariaDB. – Bill Karwin Jan 01 '23 at 20:45
  • @Bill Karwin thank you i was still think its more or less the same base(and MySql says more to people) and the things for MySQL will work also on maria and reverse, will remember it on my next things. – StefanBD Jan 01 '23 at 20:55
  • how many stock ids are you specifying in `AND stock_id in ({$val})`? – ysth Jan 02 '23 at 00:50
  • please edit your question to show (as text, not images) an actual query (with the parameters replaced with realistic sample values) and output of `explain select ...` for that query and output of `show create table end_day_quotes_AVG`. This is all essential information to help with optimizing a query. – ysth Jan 02 '23 at 00:51
  • not sure how suspended stocks change anything; do you just mean that you need to show the last bids even if it was long ago for some stocks? – ysth Jan 02 '23 at 00:52
  • @ysth the things in can be from 1 to n, there's basically no limit, I added my with around 90. I also added the create table and your wished explain. "do you just mean that you need to show the last bids even if it was long ago for some stocks" exact, like Stock 1 last quote is from 2023-01-03 and stock 50 can the last entry be from 2022-06-01. There's a little limitation, the latest entry per stock must not be the highest PRIMARY KEY (`id`) . – StefanBD Jan 02 '23 at 11:12
  • 1
    for future questions, please do show just the output of `show create table yourtablename`; it's much easier to see and know that nothing is left out than a series of create/alter statements. – ysth Jan 02 '23 at 19:26
  • I was assuming you were wanting the last two bids for each stock, but on rereading your question, maybe you are wanting the last bid as of today and the last bid as of yesterday instead? if so, you might to better to avoid the window function approach and do it (in two separate queries) like https://stackoverflow.com/a/15422121/17389 (though still using a values table constructor instead of IN) – ysth Jan 02 '23 at 19:49
  • yap last and the previous, u will check your link, thanks – StefanBD Jan 02 '23 at 19:52

3 Answers3

1

wI ant to get the last bids (its sorted by date) only one per stock_id.

I also need the same for the first lower date, which is not in the first query.

Are you looking for the two latest quotes of each bid as of a given date? If so, you can just modify the first query to allow row numbers 1 and 2:

select `quote_date`, 'stocks' as `type`, `bid`, `stock_id` as id 
from ( 
    select t.*, row_number() over(partition by stock_id order by quote_date desc) as rn f
    from end_day_quotes_AVG t 
    where quote_date <= DATE(?) AND stock_id in (?)  and currency_id = ? 
) x 
where rn <= 2  -- the latest two
GMB
  • 216,147
  • 25
  • 84
  • 135
  • yap exact but my problem is the performance, when i reduce the 2 querys to one i maybe save 50%? so i still have 2.5 Sec run time which is my problem. I need improve the performance or is this already the fasted method and i should work more with caching? – StefanBD Jan 01 '23 at 17:54
  • 1
    @StefanBD: (1) This is a pure sql solution, so it should perform better than your original php/sql code. (2) Just in case: you are aware that you can fetch all quotes at once (just changing the `where` clause)? No need for a php loop on top of the query – GMB Jan 01 '23 at 20:35
  • i already fetched all stock quotas, the only difference is that i used 2 query instant of your 1 so my performance should be reduced by 50% but is still too slow. Your query take 3.7 seconds which is exactly my problem – StefanBD Jan 01 '23 at 20:51
  • I have indexed stock_id + currency_id and one on the date, currently the table holds 2143962 lines – StefanBD Jan 01 '23 at 20:59
1

It looks like no indexes are being used as is, which can often be the case with large IN lists. Change to join from a values table constructor, create a (currency_id,stock_id,quote_date) index and it should be able to use that.

select `quote_date`, 'stocks' as `type`, `bid`, `stock_id` as id
from (
    select t.*, row_number() over(partition by currency_id,stock_id order by `quote_date` desc) as rn 
    from (
        select null stock_id where 0
        union all
        values (2),(23),(19),(41),(40),(26),(9),(43),
            (22),(44),(28),(32),(30),(34),(20),(10),
            (13),(17),(27),(35),(8),(29),(39),(16),
            (33),(5),(36589),(25),(18),(6),(38),(37),
            (3),(45),(7),(21),(46),(15),(4),(24),
            (31),(36),(38423),(40313),(22561),(36787),(35770),(36600),
            (35766),(42),(22567),(40581),(40569),(29528),(22896),(24760),
            (40369),(40396),(40400),(40374),(36799),(1),(27863),(29659),
            (40367),(27821),(24912),(36654),(21125),(22569),(22201),(23133),
            (40373),(36697),(36718),(26340),(36653),(47),(34019),(36847),
            (36694)
    ) as stock_ids
    join end_day_quotes_AVG t on t.currency_id=2 and t.stock_id=stock_ids.stock_id and t.quote_date <= date('2023-01-02')
) x where rn = 1

(The select where 0/union are just to give the column a useful name, since mariadb's default is extremely unhelpful.)

ysth
  • 96,171
  • 6
  • 121
  • 214
  • reduce time from 3.6 seconds to 1.55 seconds. Maybe its better to cache the data(or single stock quotes) over Redis i will also read the link u posted, maybe i find here a solution who is faster. – StefanBD Jan 02 '23 at 19:53
  • did you add a (currency_id,stock_id,quote_date) index? did it use it? if so, what did key_len in the explain line for t show? – ysth Jan 02 '23 at 20:03
  • i have a key on this 3 – StefanBD Jan 02 '23 at 20:11
  • the explain of the statment https://i.imgur.com/MZsmwRx.jpeg – StefanBD Jan 02 '23 at 20:13
  • not sure what "have a key on this 3" means; a composite index on all three columns is very different than individual indexes on the 3. – ysth Jan 02 '23 at 20:17
  • i place one key over the 3 fields, sorry i should write more clear – StefanBD Jan 02 '23 at 20:19
  • could you show output of `show create table end_day_quotes_AVG;` after adding the index? – ysth Jan 02 '23 at 20:20
  • https://pastebin.com/EVMcVjVh paste it here dont work – StefanBD Jan 02 '23 at 20:22
  • ah, order is important. since you have fixed values for currency_id and stock_id, but a range of values for quote_date that you want to look for, quote_date should come last in the index. Try it with `(currency_id,stock_id,quote_date)` – ysth Jan 02 '23 at 20:25
  • still around 1.6s – StefanBD Jan 02 '23 at 20:29
  • any change to the explain? did it show using the new index? if so, what was key_len? – ysth Jan 02 '23 at 20:31
  • quote_date_2,quote_date_3,quote_date_4,currency_id currency_id 9 const,stock_ids.stock_id 6 Using index condition or here the whole https://pastebin.com/Br6n160e – StefanBD Jan 02 '23 at 20:33
  • so the 9 is 4 for the currency_id plus 5 for the stock_id (5, not 4, because it can be null). so it isn't using date at all. that's too bad. you can almost certainly get it to use the date part of the index too if you were just interested in the last date, but previous to last is harder. – ysth Jan 02 '23 at 20:40
  • puh, well i think its maybe smarter to work here with a cache system who holds the current often used one(Redis with stockid+currency_id as key) and use the MariaDB only as long term Storage. Thank you very much for your help and even your answer got me a 50% better performance. – StefanBD Jan 02 '23 at 20:48
  • I'm thinking about it still. There's only ever one entry per stock per day? – ysth Jan 02 '23 at 23:32
  • Yap one per entry per day when you mean stockid, currencyid and quote date but there can be exist the same for etfs, and there can be a hole in the days like 2023-01-03 and previous is 2022-12-12 or just the latest is from 2022-12-20. – StefanBD Jan 03 '23 at 04:31
  • the same what for etfs? different currency ids with the same stockid and quote date? – ysth Jan 03 '23 at 04:33
  • oh, rows with eft_id non-null have null stock_id? – ysth Jan 03 '23 at 04:39
  • A row with etf id have stock id null(this ca be the same with their other _id fields like crYpto_id have stock and etf id null) , im currently still in development so i can easy change the structure if its need. – StefanBD Jan 03 '23 at 04:58
  • And yes you can have the same stock for same date with different currency_ids like usd or eur. Sorry i miss this in the last answer – StefanBD Jan 03 '23 at 04:59
  • is the query in https://dbfiddle.uk/jDkzXNou any faster? – ysth Jan 03 '23 at 05:00
  • yap 1. run is 1.2s, 2. run is 0.74, wow – StefanBD Jan 03 '23 at 05:09
  • updated the query in the answer to add currency_id to the partition; just noticed it was missing and that could be enough to make it use the composite index correclty – ysth Jan 03 '23 at 15:24
  • I'm putting the query in the fiddle in a new answer, since it's a very different approach – ysth Jan 03 '23 at 15:25
1

To get the last bid (before a certain date) and second to last bid for each currency/stock in a single query and efficiently use an index on currency_id,stock_id,quote_date, you can do it incrementally: first find the maximum date for each currency/stock (which will use the index), then find the previous date (again, in a way that uses the index), and then look up the actual bids:

with stock_ids(stock_id) as (
    values (2),(23),(19),(41),(40),(26),(9),(43),
           (22),(44),(28),(32),(30),(34),(20),(10),
           (13),(17),(27),(35),(8),(29),(39),(16),
           (33),(5),(36589),(25),(18),(6),(38),(37),
           (3),(45),(7),(21),(46),(15),(4),(24),
           (31),(36),(38423),(40313),(22561),(36787),(35770),(36600),
           (35766),(42),(22567),(40581),(40569),(29528),(22896),(24760),
           (40369),(40396),(40400),(40374),(36799),(1),(27863),(29659),
           (40367),(27821),(24912),(36654),(21125),(22569),(22201),(23133),
           (40373),(36697),(36718),(26340),(36653),(47),(34019),(36847),
           (36694)
),
last_dates as (
    select t.currency_id, t.stock_id, max(t.quote_date) as quote_date
    from stock_ids
    join end_day_quotes_AVG t on
        t.currency_id=2 and
        t.stock_id=stock_ids.stock_id and
        t.quote_date <= '2023-01-31'
    group by t.currency_id,t.stock_id
),
next_to_last_dates as (
    select t.currency_id, t.stock_id, max(t.quote_date) as quote_date
    from last_dates l
    join end_day_quotes_AVG t on
        t.currency_id=l.currency_id and
        t.stock_id=l.stock_id and
        t.quote_date < l.quote_date
    group by t.currency_id,t.stock_id
)
select 'last' as 'when', currency_id, stock_id, quote_date, bid
from last_dates
join end_day_quotes_AVG using (currency_id, stock_id, quote_date)
union all
select 'next-to-last', currency_id, stock_id, quote_date, bid
from next_to_last_dates
join end_day_quotes_AVG using (currency_id, stock_id, quote_date)

If you wanted more than just the two most recent dates for each stock, you likely could replace last_dates/next_to_last_dates with a recursive cte that included a day number (limited to however many days you want to gather).

fiddle

ysth
  • 96,171
  • 6
  • 121
  • 214