The schema
The sqlite database described below is currently 52 GB. It was generated as part of an Entity Framework Core 6.0 project.
The Trades
table
CREATE TABLE IF NOT EXISTS "Trades" (
"Id" INTEGER NOT NULL CONSTRAINT "PK_Trades" PRIMARY KEY AUTOINCREMENT,
"SymbolId" INTEGER NOT NULL,
"Price" TEXT NOT NULL,
"Quantity" TEXT NOT NULL,
"TimeStamp" TEXT NOT NULL,
"Side" INTEGER NOT NULL,
"Type" INTEGER NOT NULL,
CONSTRAINT "FK_Trades_Symbols_SymbolId" FOREIGN KEY ("SymbolId") REFERENCES "Symbols" ("Id") ON DELETE CASCADE
);
This holds trades from the Kraken cryptocurrency exchange.
The data was populated from CSVs downloaded here:
The most recent trades were retrieved via API.
Data excerpt:
sqlite> SELECT * FROM Trades LIMIT 5;
Id SymbolId Price Quantity TimeStamp Side Type
-------- -------- ----- --------- ------------------- ---- ----
17272735 29 2.364 29.18557 2021-08-10 15:31:26 0 0
17272736 29 2.364 12.50281 2021-08-10 15:31:34 0 0
17272737 29 2.363 416.76043 2021-08-10 15:31:36 0 0
17272738 29 2.365 38.94156 2021-08-10 15:32:00 0 0
17272739 29 2.363 0.41676 2021-08-10 15:32:05 0 0
The current size of the table:
sqlite> SELECT COUNT(*) FROM Trades;
COUNT(*)
---------
535560814
Run Time: real 7.746 user 0.984375 sys 6.750000
The Symbols
table
CREATE TABLE IF NOT EXISTS "Symbols" (
"Id" INTEGER NOT NULL CONSTRAINT "PK_Symbols" PRIMARY KEY AUTOINCREMENT,
"Name" TEXT NOT NULL
);
Each trading pair has a symbol. E.g. XBTUSD
for Bitcoin.
Data excerpt:
sqlite> SELECT * FROM Symbols LIMIT 5;
Id Name
-- --------
29 1INCHEUR
30 1INCHUSD
31 AAVEAUD
32 AAVEETH
33 AAVEEUR
Some key indices
Indices on some critical columns:
CREATE UNIQUE INDEX "IX_Symbols_Name" ON "Symbols" ("Name");
CREATE INDEX "IX_Trades_SymbolId" ON "Trades" ("SymbolId");
CREATE INDEX "IX_Trades_TimeStamp" ON "Trades" ("TimeStamp");
Query performance
Most of the basic queries I've run so far seem to be quite slow. Here are some examples.
10 most recent XBTUSD trades (31 seconds)
SELECT *
FROM Trades
WHERE SymbolId = (SELECT Id FROM Symbols WHERE Name = 'XBTUSD')
ORDER BY TimeStamp DESC
LIMIT 10;
Run Time: real 31.641 user 28.984375 sys 2.640625
Id SymbolId Price Quantity TimeStamp Side Type
--------- -------- ------- ---------- ----------------------- ---- ----
552833548 377 48683.1 0.157 2021-12-06 06:16:17.804 1 0
552833547 377 48681.5 0.00049872 2021-12-06 06:16:04.855 0 0
552833546 377 48681.4 0.32 2021-12-06 06:16:02.729 1 0
552833545 377 48680.4 0.1 2021-12-06 06:15:53.826 0 0
552833544 377 48697.4 0.05 2021-12-06 06:15:39.56 1 1
552833543 377 48694.5 0.04923492 2021-12-06 06:15:22.787 0 0
552833542 377 48690.2 0.00210422 2021-12-06 06:15:22.786 0 0
552833541 377 48689.6 0.02201001 2021-12-06 06:15:22.487 0 0
552833540 377 48682.5 0.00052279 2021-12-06 06:15:22.486 0 0
552833539 377 48652.3 0.00011573 2021-12-06 06:15:22.485 0 0
Show high and low by month for XBTUSD (64 seconds)
SELECT
strftime('%Y-%m', TimeStamp) AS date,
COUNT(*) AS trade_count,
min( CAST(Price AS DECIMAL) ) AS high,
max( CAST(Price AS DECIMAL) ) AS low
FROM Trades
WHERE
SymbolId = (SELECT Id FROM Symbols WHERE Name = 'XBTUSD')
GROUP BY strftime('%Y-%m', TimeStamp)
ORDER BY date;
Run Time: real 64.743 user 55.421875 sys 8.625000
date trade_count high low
------- ----------- --------- ---------
2013-10 422 122 207.30246
2013-11 7147 201.04 1198.8009
2013-12 14964 380 1130
2014-01 7747 737.15115 1019
2014-02 8990 450 835
2014-03 4314 436.07375 720
2014-04 3063 353.75804 583.75327
2014-05 2046 422.19 629.0355
2014-06 2533 525 677.90898
2014-07 838 566.75121 658.87046
2014-08 1756 455 602.36146
2014-09 930 295 527.99987
2014-10 1188 273.32564 418.98998
...
Number of trades by symbol (53 seconds)
SELECT Symbols.Name, SymbolId, COUNT(*) AS trade_count
FROM Trades JOIN Symbols ON Trades.SymbolId = Symbols.Id
GROUP BY SymbolId
ORDER BY trade_count;
Run Time: real 53.176 user 46.031250 sys 7.031250
Name SymbolId trade_count
--------- -------- -----------
WBTCXBT 369 561
RAYUSD 291 674
WBTCUSD 368 1003
YFIAUD 407 1460
GHSTGBP 167 1488
WBTCEUR 367 1860
FILAUD 154 2036
RAYEUR 290 2154
BNTGBP 88 2437
GRTAUD 174 2514
ZRXGBP 417 2732
BNTXBT 90 3196
SUSHIGBP 333 3387
...
The question
Do you have any suggestions for how to improve the performance of the queries shown above?
I.e. would you recommend additional indices? Or perhaps there are some sqlite specific parameters I should take a look at?
Or would you say that, based on the size of the database and the fact that it's sqlite, queries like this are just going to be slow. If that's the case, I guess I can consider using SQL Server or Postgresql. I like the idea of sqlite since it has the lowest overhead as far as setup goes. It's also free (unlike SQL Server).
Notes
Although I'm showing sqlite queries above, the database and schema were created via an Entity Framework Core 6.0 project.
The project which includes code for the EF Core database as well as programs which import the trade data from CSV and API is available here:
https://github.com/dharmatech/kraken-trades-database
The entity models and database context:
Code that imports the CSV (if you're curious):
https://github.com/dharmatech/kraken-trades-database/blob/008-import-api/KrakenImportCsv/Program.cs
Code that imports via API (if you're curious):
https://github.com/dharmatech/kraken-trades-database/blob/008-import-api/KrakenImportApi/Program.cs
TEXT columns
Lasse asks a good question in the comments below:
is there a reason why your price and amount was typed to text and not to REAL
He's referring to these columns in the Trades
table:
"Price" TEXT NOT NULL,
"Quantity" TEXT NOT NULL,
That's what EF Core generated from the following C# model class:
[Index(nameof(TimeStamp))]
public class Trade
{
public int Id { get; set; }
public int SymbolId { get; set; }
public Symbol Symbol { get; set; } // navigation property
public decimal Price { get; set; }
public decimal Quantity { get; set; }
public DateTime TimeStamp { get; set; }
public OrderSide Side { get; set; }
public OrderTypeMinimal Type { get; set; }
}