So I've hosted a video sharing website and right now I have approximately 2 million rows stored in an AWS RDS db.t2.large
. The pricing for this instance is $0.136/hr
. However it is very slow because it only has 2 CPU's. The cost to get an instance with 8 CPUs is $0.544/hr
which is well over my budget. I am running the statement "SELECT * FROM videos ORDER BY RAND() LIMIT 100"
on almost every page and it is too slow with only 2 CPU's. With 8 CPU's it is bearable. How do I make this faster? I don't want users waiting a long time every on refresh or new link. The number of rows is expected to grow 8 million rows when i complete the data dump.

- 13
- 2
1 Answers
Is it perhaps enough to just select randomly from within a date range, or from videos that have a certain amount of views? What database are you on? See e.g. MYSQL: Query order by rand() very slow
Another approach is to cache the randomized values. You can do it in a way that looks completely random to each user:
user A requests random sample => store sample in cache, and mark it as having been retrieved by A
user B requests random sample => get random sample from cache, checking whether we have retrieved it before (we haven't, it was A, so return it)
user A requests random sample => get random sample from cache, checking whether we have retrieved it before (we have, it was us, so we do another request and add it to the cache and then return it)
This makes the request of B "free", so instead of 3 requests we only have 2.

- 124
- 1
- 4
-
I like this idea of cache random values. Everyday I can set a process to create the table that contains the videos that should be displayed to the users loading the web pages. Then I retrieve these rows whenever someone reloads or hits a new link. – ExAmazonIntern Nov 08 '20 at 17:35
-
I also thought of ... generating the random ID's in the code as opposed to making the SQL engine do so. Then run 100 `SELECT * FROM videos WHERE id = %%%`. What do you think? – ExAmazonIntern Nov 08 '20 at 17:45
-
@ExAmazonIntern I'd try that as well, but I haven't benchmarked it. You'll have to account for missing IDs if you delete records, though. In the worst case, you'll miss all of the records with the query. See the link I posted, and also https://stackoverflow.com/questions/1244555/how-can-i-optimize-mysqls-order-by-rand-function – dssjoblom Nov 08 '20 at 17:51