0

I am building an app that will search over my main table (recipes) and return recipes sorted by various parameters (average rating, date posted, etc).

I am using Prisma for my database ORM, but it seems to be lacking in advanced relations in this regard. I am wanting to use it though, because it seems to make most operations far simpler; and for more advanced searching later on it provides some fun tools.

I have three tables:

recipes
recipe_id int
recipe_title text
many other fields
resources
resource_id int
recipe_id int (foreign key = recipes.recipe_id)
many other fields
ratings
rating_id int
recipe_id int (foreign key = recipes.recipe_id)
user_rating int (1-5 rating recorded here)

This is an already existing database, so I am trying to modify it as little as possible.

My current SQL query (postgres) is something like:

SELECT recipe.recipe_id, recipe.recipe_title, AVG(rating.user_rating) AS recipe_rating, COUNT(rating.rating_id) AS rating_count, resource.resource_id 
FROM recipes recipe 
LEFT OUTER JOIN ratings rating ON recipe.recipe_id = rating.recipe_id 
LEFT OUTER JOIN resources resource ON resource.recipe_id = recipe.recipe_id 
WHERE recipe.recipe_title LIKE ${`%${query}%`} ORDER BY recipe_rating DESC LIMIT 30

But I have a few issues here:

  1. It doesn't seem like the ORDER BY is right.. it is giving me a list of recipes in a not logical order. A 4 star recipe is followed by a 1 star recipe, followed by a 5 star recipe, etc.

  2. It doesn't feel like it will scale well. I will be implementing pagination (cursor based) and will need to keep grabbing the next batch of 30 recipes ordered by ratings. I figure if I had a 'statistics' table or something it would be super easy to do this quickly.

Essentially the scenario I am aiming for here is:

  • User can search / query database (originally just based on LIKE operator on recipe title, but eventually expanded to various columns)
  • Database will respond with 30 recipes, ordered by rating
  • User can request the next 30 recipes, again ordered by rating

Am I best to store the average rating in the recipes table? Or potentially create a new table that stores simply the exact data I need for the search results (eg. avg rating, recipe title, resource_id)?

This data already exists in a production database with over 100k rows inserted; so while I'm okay to make some modifications, I can't massively change the database structure.

Is this possible with Prisma? Better changing my database schema to use raw SQL?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Pat
  • 225
  • 3
  • 11

1 Answers1

1

For starters, your query is wrong. The two LEFT JOINs result in a proxy CROSS JOIN. See:

This would seem correct (and a lot faster):

SELECT rec.recipe_id, rec.recipe_title
     , rat.recipe_rating, rat.rating_count
     , res.resource_id 
FROM   recipes rec
CROSS  JOIN LATERAL (
   SELECT AVG(rat.user_rating) AS recipe_rating
        , COUNT(*) AS rating_count
   FROM   ratings rat
   WHERE  rat.recipe_id = rec.recipe_id 
   ) rat
LEFT   JOIN resources res USING (recipe_id)
WHERE  rec.recipe_title LIKE ${`%${query}%`}  -- ?
ORDER  BY rat.recipe_rating DESC NULLS LAST  -- unrated last!
        , rec.recipe_id                  -- to break ties
LIMIT  30;

If there are no ratings, the average recipe_rating is NULL. I would advise NULLS LAST in ORDER BY. See:

Database will respond with 30 recipes,

If there can be multiple resources per recipe (like it seems), then no. This will return 30 resources for the top rated recipes. If you want one row per recipe, you'll have to define how to deal with multiple resources. Aggregate, too? And then you can optimize performance some more ...

If you plan to use pagination, make sure to work with a deterministic sort order. I added rec.recipe_id for this purpose. Also, using LIMIT / OFFSET is a poor approach for pagination. See:

Am I best to store the average rating in the recipes table?

Yes - if that's possible. Depends on the write load for ratings, your requirement for accuracy, and the frequency of requests. If entries change a log, computing on the fly is more accurate, and keeping a snapshot of count and avg up to date is costly. A MATERIALIZED VIEW or some other way of caching aggregates that's refreshed according to your requirements will typically be the best compromise.

Another misconception:

Better changing my database schema to use raw SQL?

You can use raw SQL in any case, with or without changing the schema. SQL is always the optimum way to query a Postgres database. A good ORM may come close to that, but many ORMs are crutches that butcher the versatility and performance to offer convenience. May work nicely. Until it doesn't, like seems to be the case here.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228