We've got a wide table that we're currently trying to optimize. The table has 50 columns (stats) that we eventually want to rank in descending order. Currently there's well over 5 million rows.
We're looking for ways to optimize this table both in terms of reducing complexity and improving read speed. Write speed is also important to us, but read is more critical. The ranks of these statistics should be as close to real time as possible with an optimal solution being one that ranks quickly on a per request basis (new rows are being added all the time and we want to show ranks for these rows as soon as possible.)
We're currently evaluating whether or not a vertical table layout would be a.) more performant, and b.) easier to work with.
Because the stats that are being inserted are not necessarily well defined, it's easier for us if they aren't hard coded into the table (hence the preference for a vertical table structure.)
Here's a look at our current table structure and query:
CREATE TABLE Stats
(
Id BIGINT PRIMARY KEY NOT NULL,
UserId INT,
Name VARCHAR(32) NOT NULL,
Value DECIMAL(10,4) DEFAULT ((0)) NOT NULL,
UpdatedAt DATETIME
);
CREATE INDEX Leaderboard__index ON Stats (Name, Value DESC);
SELECT
Id,
Name,
Value,
RANK() OVER (PARTITION BY Name ORDER BY Value DESC) AS Rank
FROM
Stats
ORDER BY
Value DESC
Typically we'd either be searching for top N rows for any given stat (like a leaderboard), or we'd be selecting a single UserId and getting the rank of all stats associated with that UserId.
The data is of considerable size (as I mentioned above, because there's a lot of rows and a lot of columns, a vertical table structure might be in the range of 250 million rows and will continue to grow.)
We're looking to fetch this data as fast as possible on whatever hardware is required, seconds is our target, as we're currently in the minutes range.
In a test of the vertical table structure we've inserted over 400,000 rows of data and the query above takes a little less than 3 minutes (though it also only took about 18 seconds less to rank 10,000 rows.)
I'd love to hear any suggestions. Thanks for your time!