6

I'd like to paginate through a randomly sorted list of ActiveRecord models (rows from MySQL database).

However, this randomization needs to persist on a per-session basis, so that other people that visit the website also receive a random, paginate-able list of records.

Let's say there are enough entities (tens of thousands) that storing the randomly sorted ID values in either the session or a cookie is too large, so I must temporarily persist it in some other way (MySQL, file, etc.).

Initially I thought I could create a function based on the session ID and the page ID (returning the object IDs for that page) however since the object ID values in MySQL are not sequential (there are gaps), that seemed to fall apart as I was poking at it. The nice thing is that it would require no/minimal storage but the downsides are that it is likely pretty complex to implement and probably CPU intensive.

My feeling is I should create an intersection table, something like:

random_sorts( sort_id, created_at, user_id NULL if guest)

random_sort_items( sort_id, item_id, position )

And then simply store the 'sort_id' in the session. Then, I can paginate the random_sorts WHERE sort_id = n ORDER BY position LIMIT... as usual.

Of course, I'd have to put some sort of a reaper in there to remove them after some period of inactivity (based on random_sorts.created_at).

Unfortunately, I'd have to invalidate the sort as new objects were created (and/or old objects being removed, although deletion is very rare). And, as load increases the size/performance of this table (even properly indexed) drops.

It seems like this ought to be a solved problem but I can't find any rails plugins that do this... Any ideas? Thanks!!

Matt Rogish
  • 24,435
  • 11
  • 76
  • 92

3 Answers3

6

MySQL has a RAND function you can use in your ORDER clause, passing a seed tied to the user session.

ORDER BY RAND(?)

Where ? is a seed value from the session. This will give you repeatable ordering across requests.

Toby Hede
  • 36,755
  • 28
  • 133
  • 162
  • Yeah, that works as long as the rows in the table never change (if a new one is added, it's my understanding that the whole set may change). Also, it causes a table scan each time, which can be a big performance hit... – Matt Rogish Mar 09 '10 at 15:25
  • your comment seems valid (though sometimes you *want* everything to be resorted if the underlying table adds something new). Unfortunately I can't think of a way for it to avoid scanning the entire table each time, though maybe an 'order by id limit 10' might be smart enough to exit out early...). If you don't do an "order by" however, then adding a new entry could still randomize your output, AFAIK, so this may be a hard question to get just right... – rogerdpack Feb 17 '13 at 03:23
  • looks like order by rand then limit "does" at least a full table scan (in mysql) though there may be tricks you could use to avoid it to speed this up for large tables: http://stackoverflow.com/questions/211329/quick-selection-of-a-random-row-from-a-large-table-in-mysql/211388 – rogerdpack Feb 26 '13 at 16:49
2

I'm probably missing something, but wouldn't something like this

select ... order by sha1(concat($session_id,item_id)) limit m,n;

work to give you a random-ordered, repeatable per-session paginated list ? Not very nice on index usage but you avoid any pre-filling / tmp tables / invalidation.

ggiroux
  • 6,544
  • 1
  • 22
  • 23
0

Personally, to save storage space and sanity, I'd just use a random seed using your user_id.

srand user_id
items.sort_by{ rand }
ghoppe
  • 21,452
  • 3
  • 30
  • 21
  • That would use a lot of ruby memory to sort the entire array, right? – Matt Rogish Mar 08 '10 at 22:24
  • I'm a little beyond my depth when it comes to the internals, but since it's just sorting pointers, what would be a more efficient way? Perhaps a custom .shuffle! method with a passed random seed? http://stackoverflow.com/questions/2039902/how-does-rubys-sort-by-rand-work – ghoppe Mar 08 '10 at 22:35
  • Another possible difficulty here is that srand is shared across your whole ruby process (AFAICT), so if you're multi-threaded, and not synchronized, there could be the possibility for race conditions. That being said, in 1.9.2 I believe there's a new Random class you might could use instead (but yes, you'd be loading the entire set into RAM, which you may be able to avoid, see Toby Hede's for a mysql way). – rogerdpack Feb 17 '13 at 03:26