2

I have maybe about 1,000,000 rows to load into C++ objects (through about 10,000 SELECTS). I've profiled the load, and note that the sqlite3_step statement here is the bottleneck.

sqlite3_stmt *stmt;
std::string symbol = stock->getSymbol();
boost::format sql("SELECT date,open,high,low,close,volume FROM Prices WHERE symbol=\"%s\" ORDER BY date DESC");
sql % symbol;

if (sqlite3_prepare_v2(databaseHandle_, sql.str().c_str(), -1, &stmt, NULL) == SQLITE_OK) {
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        int date = sqlite3_column_int(stmt, 0);
        float open = sqlite3_column_double(stmt, 1);
        float high = sqlite3_column_double(stmt, 2);
        float low = sqlite3_column_double(stmt, 3);
        float close = sqlite3_column_double(stmt, 4);
        int volume = sqlite3_column_int(stmt, 5);
        Price *price = new Price(new Date(date), open, close, low, high, volume);
        stock->add(price);
    }
} else {
    std::cout << "Error loading stock" << std::endl;
}

I am using the amagalmated sqlite.h/c version 3.15.0. Any ideas how I can speed up performance?

More info:

CREATE TABLE Prices (symbol VARCHAR(10) NOT NULL, date INT(11) NOT NULL, open DECIMAL(6,2) NOT NULL, high DECIMAL(6,2) NOT NULL,low DECIMAL(6,2) NOT NULL, close DECIMAL(6,2) NOT NULL, volume INT(10) NOT NULL, PRIMARY KEY (symbol, date))

CREATE INDEX `PricesIndex` ON `Prices` (`symbol` ,`date` DESC)

EXPLAIN QUERY PLAN SELECT * FROM Prices WHERE symbol="TSLA" ORDER BY date DESC;

returns

SEARCH TABLE PRICES USING INDEX PricesIndex (symbol=?)

Further Note: Such SELECTs as shown above take 2ms in SQLite Browser for Mac Execute SQL.

statguy
  • 1,627
  • 3
  • 13
  • 22
  • If you're concerned about performance why are you dynamically-allocating `Price` and `Date`? – Vittorio Romeo Nov 24 '16 at 18:23
  • 1
    I heard somewhere that it helps if you index your rows, but that's just a shot in the dark :) It looks like that you would need to speed up your database in general, as your code does hardly nothing (assuming your constructors are not expensive) – Rakete1111 Nov 24 '16 at 18:23
  • @VittorioRomeo Would that really slow down performance? – Rakete1111 Nov 24 '16 at 18:24
  • @Rakete1111: it's certainly slower than stack allocation *(or emplacement)*. Also, it kills data locality. – Vittorio Romeo Nov 24 '16 at 18:25
  • I can see some extra cache misses, but should be insignificant compared to the power of the SQL. – user4581301 Nov 24 '16 at 18:25
  • Maybe binding can help? http://stackoverflow.com/questions/1703203/in-sqlite-do-prepared-statements-really-improve-performance – Simon Kraemer Nov 24 '16 at 18:30
  • Do you use transaction? This can improve performances a lots sometimes. Please show a little bit more of your code. – mgagnon Nov 24 '16 at 18:31
  • @mgagnon Is transaction important for the sqlite3_stmt? I am using transactions for bulk inserts, but not here. – statguy Nov 24 '16 at 18:35
  • You're right sorry. Since you do only 1 select, the BEGIN TRANSACTION will have no effect. – mgagnon Nov 24 '16 at 18:47
  • SQLite computes result rows on demand, so `sqlite_step()` is where all the work is done. Show the database schema, and the output of [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html) for your query. – CL. Nov 24 '16 at 19:30

1 Answers1

2

Your index already speeds up searching for matching rows, and returns them in the correct order so that no separate sorting step is required.

However, the database still has to look up the corresponding table row for each index entry. You can speed up this particular query by creating a covering index on all the used columns:

CREATE INDEX p ON Prices(symbol, date, open, high, low, close, volume);

But instead of duplicating all data in the index, it would be a better idea to make this table a clustered index:

CREATE TABLE Prices (
    symbol VARCHAR(10) NOT NULL,
    date INT(11) NOT NULL,
    open DECIMAL(6,2) NOT NULL,
    high DECIMAL(6,2) NOT NULL,
    low DECIMAL(6,2) NOT NULL,
    close DECIMAL(6,2) NOT NULL,
    volume INT(10) NOT NULL,
    PRIMARY KEY (symbol, date)
) WITHOUT ROWID;
CL.
  • 173,858
  • 17
  • 217
  • 259