-1

I ask this on StackOverflow after carefully reading this answer about StackOverflow vs dba.se—I’m a non-expert database novice, and in my possibly-misguided estimation, a fellow non-DBA coder can help me just as well as a database expert. SQLite is also a “lite” database.

My SQLite table is for, say, a recipes scenario. It has two columns: each row has a field, meal and one ingredient required by the meal. Since most meals take more than one ingredient, there are many rows with the same meal but different ingredients.

I need to know how many meals the exact set of ingredients can make—actually I need a sorted list of all the ingredients and how many meals can be made with exactly those ingredients. I hope the code will explain this completely:

CREATE TABLE recipes (
      meal TEXT,
      ingredient TEXT);

INSERT INTO recipes VALUES
  ("tandoori chicken","chicken"), ("tandoori chicken","spices"),
  ("mom's chicken","chicken"), ("mom's chicken","spices"),
  ("spicy chicken","chicken"), ("spicy chicken","spices"),

  ("parmesan chicken","chicken"), ("parmesan chicken","cheese"), ("parmesan chicken","bread"),
  ("breaded chicken","chicken"), ("breaded chicken","cheese"), ("breaded chicken","bread"),

  ("plain chicken","chicken");

Here, we have

  • one set of three meals that use exactly the same ingredients (tandoori chicken, mom’s chicken, and spicy chicken),
  • another set of two meals using a different set of ingredients, and
  • one meal other meal that needs exactly its ingredient.

I want something like the following:

chicken,,,spices|3
chicken,,,cheese,,,bread|2
chicken|1

That is, a string containing the exact set of ingredients and how many meals can be made using exactly these ingredients. (Don’t worry about collating/sorting the ingredients, I can ensure that for each meal, rows will be inserted in the same order all the time. Also, don’t worry about pathological cases where the same meal-ingredient row is repeated—I can prevent that from happening.)

I can get the above output like this:

  WITH t
       AS (SELECT group_concat(recipes.ingredient, ",,,") AS ingredients
           FROM   recipes
           GROUP  BY recipes.meal)
  SELECT t.ingredients,
         count(t.ingredients) AS cnt
  FROM   t
  GROUP  BY t.ingredients
  ORDER  BY cnt DESC;

There’s a couple of reasons I’m not happy with this: first, it creates a sub-view and I’m really curious if there’s a way to achieve this without a sub-view—that would likely be faster and clearer. And second, inside the sub-view, I create a string via group_concat to represent the vector of ingredients—I feel like there ought to be a row-based, or data structure-like, way to get the same information out of SQL.

My question: can I get the above output, or some equivalent, without using sub-views and/or without string concatenation?

Community
  • 1
  • 1
Ahmed Fasih
  • 6,458
  • 7
  • 54
  • 95
  • 1
    If such a query is even possible without a sub-select (maybe with lots of ugly self joins assuming a maximum number of possible ingredients? hmm) I think it would be way harder to understand.Sub-queries are quite normal in SQL and I can't really see why anyone would want to avoid them. My opinion is to code for clarity first and if that's not fast enough only then rewrite. Also I see nothing wrong with group_concat as long as one uses delimiters that have no chance of occurrence in the data. Which is the case with your three commas I suppose. – Kjetil S. Jan 21 '17 at 22:12

1 Answers1

1

This simplification seems to work:

SELECT distinct  group_concat(recipes.ingredient, ",,,")
     , count(*) AS cnt
FROM      recipes recipes
GROUP  BY recipes.meal 
ORDER  BY cnt DESC;

It's really just a re-formulation of what you have already though, without the nested query or common table expression.

Since a recipe can have an arbitrary number of ingredients doing repeated joins isn't feasible (without recursion) so I think this is a great example of how handy the GROUP_CONCAT() function is.

Edit: Woops, you are right, sorry about that. Looking at the problem again, I think that a separate result set is required. There are 2 levels of aggregation, one to 'pivot' the data so it is the recipe grain with a list of ingredients for each, and then another to count the number of recipies with the same ingredients list. Below is a simple way to look at it, with the use of 'order by' in the GROUP_CONCAT to control the ordering, so the same list of ingredients is grouped together. –

select ingredients_list, count(*) from (     SELECT meal, group_concat(recipes.ingredient, ",,," order by recipes.ingredient) as ingredients_list FROM recipes recipes GROUP BY recipes.meal ) meal_ingredients group by ingredients_list ; 
Vince Hill
  • 343
  • 3
  • 7
  • Doh, I need a sub-view after all, since `group_concat`’s order is arbitrary and so the ingredients might be shuffled in the resultant string per http://stackoverflow.com/q/1897352/500207 … – Ahmed Fasih Jan 20 '17 at 14:22
  • Doh—Vince, this actually doesn’t work. `cnt` is the number of elements being `group_concat`ted, **not** how often the result appears. Compare my original output `chicken,,,spices|3` and `chicken,,,cheese,,,bread|2` to the output of your code: `chicken,,,cheese,,,bread|3` and `chicken,,,spices|2`—the numbers mean different things. Any suggestions? – Ahmed Fasih Jan 21 '17 at 01:03
  • Woops, you are right, sorry about that. Looking at the problem again, I think that a separate result set is required. There are 2 levels of aggregation, one to 'pivot' the data so it is the recipe grain with a list of ingredients for each, and then another to count the number of recipies with the same ingredients list. Below is a simple way to look at it, with the use of 'order by' in the GROUP_CONCAT to control the ordering, so the same list of ingredients is grouped together. – Vince Hill Jan 23 '17 at 03:35
  • select ingredients_list, count(*) from ( SELECT meal, group_concat(recipes.ingredient, ",,," order by recipes.ingredient) as ingredients_list FROM recipes recipes GROUP BY recipes.meal ) meal_ingredients group by ingredients_list ; – Vince Hill Jan 23 '17 at 03:40
  • Would it be possible for you to edit your answer with this? I will mark it as accepted then :)! – Ahmed Fasih Feb 01 '17 at 03:06