I have a specific request to do on my database (PostgreSQL v9.4.5), and I don't see any elegant solution in pure SQL to solve it (I know I can do it using Python or other, but I have several billions lines of data, and the calculation time would be greatly increased).
I have two tables : trades and events. These tables both represent the trades occurring in an orderbook during a day (this is why I have several billions lines, my data is over several years) but there are many more events than trades.
Both tables have columns time, volume and quantity, however each one has other columns (let's say respectively foo and bar) with specific information. I want to make a correspondence between the two tables on the columns time, volume and price, as I know this correspondence exists as an injection from trades to events (if there are n rows in trades with the same time t, the same price p and the same volume v, I know there are also n rows in events with the time t, the price p and the volume v).
Trades :
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
Events :
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
What I want is :
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
I cannot do a classic INNER JOIN, or else I will have all the possible crossing between the two tables (in this case I would have 6x6 then 36 rows).
The though thing is to have only one row versus one row, although several rows could fit.
Thank you for your help.
EDIT :
As I said, if I use a classic INNER JOIN, for example
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
I will have something like :
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
But what I want is :
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy