3

I'm still working on the same project as when I asked my previous question on Stack Overflow. My SQL tables are fully described in that question, and I would ask you to read this to understand my new problem.

The difference now is that the two tables trades and events are no longer synchronized : now, the time doesn't exactly corresponds between the two tables. However, I know there is still a one-to-one correspondance between the two tables, meaning that each trade has a corresponding event, but some events doesn't correspond to a trade.

Trades :

  id |   time    |  price  | volume |   foo
-----+-----------+---------+--------+-------
 201 | 32400.524 |      53 |   2085 |   xxx
 202 | 32400.530 |      53 |   1162 |   xxx
 203 | 32400.531 |   52.99 |     50 |   xxx
 204 | 32401.532 |   52.91 |   3119 |   xxx
 205 | 32402.437 |   52.91 |   3119 |   xxx
 206 | 32402.832 |   52.91 |   3119 |   xxx
 207 | 32403.255 |   52.91 |   3119 |   xxx
 208 | 32404.242 |   52.92 |   3220 |   xxx
 209 | 32405.823 |   52.92 |   3220 |   xxx
 210 | 32406.839 |   52.92 |   3220 |   xxx

Events :

   id |   time    |  price  | volume |  bar 
-----+-----------+---------+--------+------
 328 | 32399.345 |   52.91 |   3119 |  yyy
 329 | 32400.964 |   52.91 |   3119 |  yyy
 330 | 32401.194 |   52.91 |   3119 |  yyy
 331 | 32401.746 |   52.91 |   3119 |  yyy
 332 | 32401.823 |   52.91 |   3119 |  yyy
 333 | 32402.534 |   52.91 |   3119 |  yyy
 334 | 32402.876 |   52.92 |   3220 |  yyy
 335 | 32403.839 |   52.92 |   3220 |  yyy
 336 | 32404.634 |   52.92 |   3220 |  yyy
 337 | 32405.234 |   52.91 |   2501 |  yyy

What I want is making the correspondance between the two tables by minimizing the difference of time between the trade and the event. It makes sense : if there are several events which correspond to a trade in volume and price, we must take the event which is the least "far in time" from the trade.

I have tried to do the following thing :

SELECT 
    t.*,
   (SELECT e.id
        FROM events o
        WHERE e.price = t.price
        AND e.volume = t.volume
        ORDER BY ABS(o.time - t.time)
        LIMIT 1
    ) as most_probable_corresponding_event_id
FROM trades t
ORDER BY t.time;

But the problem is this query doesn't give unique correspondance : a same event e can be choose for different trades t1 and t2, if this event is the nearest from both trades t1 and t2. What I want is making an exclusive correspondance.

Thank you for your help.

EDIT :

The output I expect for the example datas would be :

   trade_id | order_id |  price  | volume |  bar |   foo 
 -----------+----------+---------+--------+------+-------
      204   |   331    |   52.91 |   3119 |  xxx |   yyy
      205   |   333    |   52.91 |   3119 |  xxx |   yyy
      206   |   334    |   52.91 |   3119 |  xxx |   yyy
      207   |   335    |   52.92 |   3220 |  xxx |   yyy
      208   |   336    |   52.92 |   3220 |  xxx |   yyy
      209   |   337    |   52.92 |   3220 |  xxx |   yyy
Edouard Berthe
  • 1,393
  • 2
  • 18
  • 38
  • could you add your expected output? – davejal Dec 15 '15 at 12:28
  • I have added the expected output for the example datas. The goal is simply to make the correspondance in order to have the xxx and yyy datas together in only one table. – Edouard Berthe Dec 15 '15 at 13:42
  • What would be the right correspondence if there are two correspondence to up and down? like, your time is: `32399.345` and there are `32400.345` and `32398.345` ? – Jorge Campos Dec 15 '15 at 13:46
  • Pick one out of the two, let's say that we pick up the first one. It was the goal of my previous question, to partition over time, price and volume then make correspondance between both first ones, then both second one etc. – Edouard Berthe Dec 15 '15 at 13:50
  • 1
    Your sample is wrong. There is no correspondence between trade_id 206 and event_id 304 since there is no match for price and volume – Jorge Campos Dec 15 '15 at 15:57
  • Aside your sample data being wrong, best I could do was to get groups ranked. When I subquery it assigning the rank order to 1 it will bring duplicate results. If you take a look into the results I have you will see that to get what you want you need to query the trade_id and event_id that wasn't previously picked. It is even hard to explain. Take a look, I think you will understand better: http://sqlfiddle.com/#!15/14ac3/1 – Jorge Campos Dec 15 '15 at 17:21

2 Answers2

0

I tried to do a lot but can't get your result. I got something that might help though.

Using the following query you get all the records with the same price and volume and the time difference between the event and the trade.

select *  from
(SELECT t.id as trade_id, e.id as event_id, e.price as price, e.volume as volume,e.bar as bar, t.foo as foo, abs(e.time-t.time) as diff 
FROM events e
inner JOIN trades t on t.price = e.price AND t.volume = e.volume order by trade_id,diff asc ) a

Using your data it is impossible to get some of the results you expect. i.e. 206 cannot have order_id 334 using a match of price and volume.

I think to overcome most problems you need to change your db and add a foreign key (to make it possible to do easier joins)

the following query will give you a result, but the event_id don't match, either because of the earlier mentioned problem (price and volume match) or because the event record with the closest time match isn't the one you actually want.

select *  from
(SELECT t.id as trade_id, e.id as event_id, e.price as price, e.volume as volume,e.bar as bar, t.foo as foo, abs(e.time-t.time) as diff FROM events e 
inner JOIN trades t on t.price = e.price AND t.volume = e.volume order by trade_id,diff asc ) a group by trade_id
davejal
  • 6,009
  • 10
  • 39
  • 82
0

Based on what you wrote on your previous question, and the example data I would expect the time to always be later in the event table than in the trade table. Therefore if you add a condition that event time > order time, that should give you a unique match unless the event table lags significantly behind the trades.

SELECT 
t.*,
(SELECT e.id
    FROM events o
    WHERE e.price = t.price
    AND e.volume = t.volume
    AND o.time > t.time
    ORDER BY ABS(o.time - t.time)
    LIMIT 1
) as most_probable_corresponding_event_id
FROM trades t
ORDER BY t.time;