5

I'm developing a simple app to return a random selection of exercises, one for each bodypart.

bodypart is an indexed enum column on an Exercise model. DB is PostgreSQL.

The below achieves the result I want, but feels horribly inefficient (hitting the db once for every bodypart):

BODYPARTS = %w(legs core chest back shoulders).freeze

@exercises = BODYPARTS.map do |bp|
  Exercise.public_send(bp).sample
end.shuffle

So, this gives a random exercise for each bodypart, and mixes up the order at the end.

I could also store all exercises in memory and select from them; however, I imagine this would scale horribly (there are only a dozen or so seed records at present).

@exercises = Exercise.all

BODYPARTS.map do |bp|
  @exercises.select { |e| e[:bodypart] == bp }.sample
end.shuffle

Benchmarking these shows the select approach as the more effective on a small scale:

Queries:            0.072902   0.020728   0.093630 (  0.088008)
Select:             0.000962   0.000225   0.001187 (  0.001113)
MrYoshiji's answer: 0.000072   0.000008   0.000080 (  0.000072)

My question is whether there's an efficient way to achieve this output, and, if so, what that approach might look like. Ideally, I'd like to keep this to a single db query.

Happy to compose this using ActiveRecord or directly in SQL. Any thoughts greatly appreciated.

SRack
  • 11,495
  • 5
  • 47
  • 60
  • 1
    Try the following: `Exercise.group(:bodypart).select('distinct on (bodypart) *').order('bodypart, random())`; – MrYoshiji Feb 26 '18 at 17:01
  • Star @MrYoshiji - I had to add `:id` to the group clause, i.e. `Exercise.group(:bodypart, :id).select('distinct on (bodypart) *').order('bodypart, random()')` to overcome the following error: `ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "exercises.id" must appear in the GROUP BY clause or be used in an aggregate function`. I'll plug it into the bench mark code and see how it compares, but that looks just the ticket - happy to accept your answer if you pop one in! – SRack Feb 26 '18 at 17:08

2 Answers2

3

From my comment, you should be able to do (thanks PostgreSQL's DISTINCT ON):

Exercise.select('distinct on (bodypart) *')
        .order('bodypart, random()')
MrYoshiji
  • 54,334
  • 13
  • 124
  • 117
  • Thanks again - I've added your approach to the benchmark in the question, and it's also incredibly fast. – SRack Feb 26 '18 at 17:13
  • @SRack Databases are always better at dealing with that kind of things ;) – MrYoshiji Feb 26 '18 at 17:14
  • Yep, that was my hope :) Just had a play and -- correct me if I'm wrong -- you can actually shave a little more off by omitting the grouping: `Exercise.select('distinct on (bodypart) *').order('bodypart, RANDOM()')`. – SRack Feb 26 '18 at 17:23
2

Postgres' DISTINCT ON is very handy and performance is typically great, too - for many distinct bodyparts with few rows each. But for only few distinct values of bodypart with many rows each (big table - and your use case) there are far superior query techniques.

This will be massively faster in such a case:

SELECT e.*
FROM   unnest(enum_range(null::bodypart)) b(bodypart)
CROSS  JOIN LATERAL (
   SELECT *
   FROM   exercises
   WHERE  bodypart = b.bodypart
   -- ORDER BY ??? -- for a deterministic pick
   LIMIT  1        -- arbitrary pick!
   ) e;

Assuming that bodypart is the name of the enum as well as the table column.

enum_range is an enum support function that (quoting the manual):

Returns all values of the input enum type in an ordered array

I unnest it and run a LATERAL subquery for each value, which is very fast when supported with the right index. Detailed explanation for the query technique and the needed index (focus on chapter "2a. LATERAL join"):

For just an arbitrary row for each bodypart, a simple index on exercises(bodypart) does the job. But you can have a deterministic pick like "the latest entry" with the right multicolumn index and a matching ORDER BY clause and almost the same performance. Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    Can't compete with that, +1 – MrYoshiji Feb 26 '18 at 17:30
  • This looks fantastic and I'll definitely be doing the recommended reading; _however_, I can't get it working. Is it due to ambiguity in my question, in that the `bodypart` column is a Rails enum column, down as an integer in the db? Thanks very much and +1 for the answer regardless @ErwinBrandstetter. – SRack Feb 26 '18 at 17:40
  • @SRack: I have been assuming an actual [Postgres `enum`](https://www.postgresql.org/docs/current/static/datatype-enum.html) data type. But that's not the core of the answer. *Any* cheap way to get the list of distinct bodyparts to work with is good ... – Erwin Brandstetter Feb 26 '18 at 17:44
  • Great - will have a good look at this tomorrow. Really appreciate the answer. – SRack Feb 26 '18 at 17:49
  • 1
    If you do not have the definitive list of bodyparts somewhere in the DB (you probably do), you can even do without it. Refer to the [chapter "1. No separate table with unique users" in the related answer.](https://stackoverflow.com/questions/25536422/optimize-group-by-query-to-retrieve-latest-record-per-user/25536748#25536748) – Erwin Brandstetter Feb 26 '18 at 17:52
  • I didn't @ErwinBrandstetter - but please don't take that as a lack of appreciation for your answer. I've learned a _lot_ reading through it and the links provided. To be honest, as someone regularly reliant on Active Record's magical helpers, this is far advanced of my knowledge on the subject. Thanks again though. – SRack Mar 12 '18 at 18:58