8

I am trying to select three random rows from a table, where their combined item_price column is less than a desired amount.

Imagine you have an <input> for a dollar amount. When you enter the dollar amount, the database returns three random items, where their combined price is less than or equal to the dollar amount you enter.

If I enter $300, you could buy these three items, $150, $100, and $50. I'm having difficulty creating a query that will return three items that meet this criteria.

SELECT t1.item_id, t1.item_price
FROM items t1
INNER JOIN items t2 ON ( t1.item_id = t2.item_id )
GROUP BY t1.item_id, t1.item_name, t1.item_price
HAVING SUM( t2.item_price ) <=300
ORDER BY RAND( )
LIMIT 3 

I thought this would work, but I think it was just a coincidence when it did. It seems to just return any three items whose prices are less than $300, not total less than $300.

I also tried this query:

SELECT t1.item_id, t1.item_price
FROM   items t1
JOIN   items t2 ON t2.item_id <= t1.item_id
WHERE  t2.item_price <= 500
GROUP  BY t1.item_id
HAVING SUM(t2.item_price) <= 500
ORDER  BY RAND()
LIMIT 3

Again, seemed to work at first, but then it started returning items for $2000.

If there's a better (even sacrificing performance) way to do this in PHP, I don't mind. I just didn't think the query would be so difficult.

As always, thanks anyone for the help.

dcclassics
  • 896
  • 1
  • 12
  • 38
  • How close do you want the total of the three items to be to the entered amount? For example, if you enter $300, and $150 worth of items returned, is that helpful? – Bitwise Creative Apr 08 '15 at 16:31
  • @BitwiseCreative: The way I imagined it, if I had a $300 limit, it could theoretically return $30 of items, if each is $10. I wouldn't be opposed to this, especially if it's easier. I guess practically, getting closer to the limit would be helpful, but it was not my initial intention. Secondly, I'd not be opposed to returning 1 item for the full $300. Thank you for asking. – dcclassics Apr 08 '15 at 16:36
  • @BitwiseCreative Also, I'll add that there probably won't be many $10 items. Would it be easier to not set an item limit, and instead give me an undefined number of random rows, so long as it's below the limit? – dcclassics Apr 08 '15 at 16:38
  • Thanks for the update. It sounds like this can be a bit more flexible. It also sounds like randomness is important. It may be a bit easier to not pull exactly three items. – Bitwise Creative Apr 08 '15 at 16:46
  • @BitwiseCreative Randomness initially was intended as an option. I.e., you could sort by DESC, and get the most expensive items possible (under $300), or you could choose the option to return three random items. – dcclassics Apr 08 '15 at 16:54
  • @dcclassics can items be repeated in the result set? Eg. can you end up buying the same item 3 times if it's below the threshold – Andre Apr 08 '15 at 17:01
  • @Andre I would prefer no duplicates. I'm afraid it would not work with the purpose of the application. – dcclassics Apr 08 '15 at 17:15
  • The safest route would be to use logic on the application side -- one big upside is testability. I added both SQL and PHP solutions in my answer below – Andre Apr 08 '15 at 18:03

6 Answers6

4

here is another solution:

SELECT t1.item_id as id1, t2.item_id as id2, t3.item_id as i3
FROM items t1, items t2, items t3
WHERE
t1.item_id <> t2.item_id and
t1.item_id <> t3.item_id and
t2.item_id <> t3.item_id and
(t1.item_price + t2.item_price + t3.item_price) <= 300
order by rand()
limit 1

optionally you can filter by minimal sum

Iłya Bursov
  • 23,342
  • 4
  • 33
  • 57
  • this seems to be closest to my original intention. Would you be able to point out any gotchas? – dcclassics Apr 08 '15 at 17:24
  • @dcclassics this query is very slow on big tables, triplets like 1,2,3 and 1,3,2 are different – Iłya Bursov Apr 08 '15 at 17:27
  • 1
    @Lashane you can try with `t1.item_id < t2.item_id AND t2.item_id < t3.item_id`, it will still be very slow but I think it will perform better. Triplets will always be ordered but it should be fine – fthiella Apr 09 '15 at 21:06
  • Would there be an easy way to randomly determine the amount of items that came back? Would that dramatically change my query? – dcclassics Apr 16 '15 at 17:51
  • @dcclassics query returns 1 row with 3 columns. number of columns is usually hard to change, as soon as number of items = number of columns - it will require significant change of the query – Iłya Bursov Apr 16 '15 at 18:06
  • I completely understand. I've realized as I've added more items to the database, three random $50 items may not always be a desirable result, especially when inputting $1000 of spending money. Just something for me to think about I suppose. – dcclassics Apr 16 '15 at 18:07
  • @dcclassics add minimal sum filter, for example half of input number – Iłya Bursov Apr 16 '15 at 18:09
3

You could do it step by step. Say we have $500 ask limit. First get the min price in your DB.

select MIN(item_price) from items

Lets say this is 25.00 so for our first item we want a max from 500 plus 2 times the least value (2 * 25 = 50) so i check for the first item matching less or equal to 450 dollars

select item_id, item_price from items where item_price <= 450 order by rand() limit 1

This item now maybe is 240 dollars, so next query is:

select item_id, item_price from items where item_price <= 140 order by rand() limit 1

The next one could be 50 dollars, so the next query is:

select item_id, item_price from items where item_price <= 90 order by rand() limit 1

And there you go.

I am aware, that this is a quite simple solution and there surely could be better solutions, but using triple joins and random sorting on large tables will swallow lots of performance, and the result of the queries are not better than running these three simple queries, that will run like burst if table is indexed properly.

Doing it this way also would give you fine control on combinations returned (i.e. you could extend items with categories and reduce queries to distinct categories, so for example you could combine technical+kitchen+fun categories).

Since we are all here to learn, and we never stop learning, i believe this solution is a good basis for a flexible extension of the functionality. If you want to use a single query, then i would advise to have the query dump a large set of possible combinations into a table, so you can run your massive query maybe once a day and when you want to pick a combination, you just query your pre-rendered random table.

hexerei software
  • 3,100
  • 2
  • 15
  • 19
  • Is this a good final solution? It seems like it would limit the randomness of the results returned. – dcclassics Apr 08 '15 at 14:00
  • this might fail if there is only one item with price between minimum and 2*minimum, and the first selection has price 450. – 1010 Apr 08 '15 at 16:55
  • @dcclassics not really, after all, it is "just" rand() and rand() will not get better or worse if called once or three times... and you can also fine tune the range to not just have rand() selections but also well weighted – hexerei software Apr 08 '15 at 17:08
  • @1010 sure, the ranges i chose where rather tight here and no matter what solution you use, random means you could end up with one $450 article and two $25 articles - so actually the advantage of this technique, is that you can twist this behavior and actually avoid this happening (which is hard to explain to a single query) – hexerei software Apr 08 '15 at 17:10
  • @hexereisoftware I like this approach, and really appreciate the time you took to explain your answer. Especially with the different categories. I can determine the calculations for each item_price using php to perform the next query. My concern was with the first MIN() query. I don't understand what its purpose is. – dcclassics Apr 08 '15 at 17:14
  • @dcclassics oh, ok.. i just selected the `MIN()` price there, so i could figure what the best max range for the first query. so in this example i queried $25 to be the min price so i knew for the first product, my value would not be allowed to be more than $450, so i could fulfill at least two more products, same for second query which should be $500 - price of first item selected $240 - min price selected $25 so next max range could be $235, but i actually chose less, so i would not necessarily end up with $25 items at the end. So the `item_price <= n` is your control on weighting items – hexerei software Apr 08 '15 at 18:03
  • as I see it, your method doesn't guarantee a result. – 1010 Apr 08 '15 at 19:41
2

you can get all triplets of items having sum of price <= 300 with

SELECT a.item_id, a.item_price, b.item_id, b.item_price, c.item_id, c.item_price
  FROM items a 
       JOIN items b ON a.item_id < b.item_id
       JOIN items c ON b.item_id < c.item_id
 WHERE a.item_price + b.item_price + c.item_price <= 300

then you could sort by rand() and pick one.

there are discussions about performance of selecting random rows in mysql that you should check. the triple join will be costly if items table is big.

EDIT

as suggested in other answers, this query can be improved filtering each item by price <= 300, and using an index on items.price.

1010
  • 1,779
  • 17
  • 27
1

I was able to get the result with both these queries and a PHP version below

SET @MaxAmount = 5;
SELECT FirstItem.id, SecondItem.id, ThirdItem.id, FirstItem.amount +  SecondItem.amount +  ThirdItem.amount as Total
FROM Items as FirstItem
CROSS JOIN Items as SecondItem  ON SecondItem.id <> FirstItem.id and FirstItem.amount + SecondItem.amount < @MaxAmount 
CROSS JOIN Items as ThirdItem ON ThirdItem.id <> FirstItem.id  and ThirdItem.id <> SecondItem.id and FirstItem.amount + SecondItem.amount + ThirdItem.amount < @MaxAmount
ORDER BY RAND()
LIMIT 3;

And

SET @MaxAmount = 5;
SELECT FirstItem.id as id1, SecondItem.id as id2, ThirdItem.id as i3,  FirstItem.amount +  SecondItem.amount +  ThirdItem.amount as Total 
FROM Items FirstItem, Items SecondItem, Items ThirdItem
WHERE FirstItem.amount + SecondItem.amount < @MaxAmount
AND FirstItem.amount + SecondItem.amount  + ThirdItem.amount < @MaxAmount
AND SecondItem.id != FirstItem.id -- Prevent Same Id from showing up
AND ThirdItem.id != FirstItem.id  and ThirdItem.id != SecondItem.id
ORDER BY RAND()
LIMIT 3;

http://sqlfiddle.com/#!9/0e1c8/3

I would only do this if the Items table is relatively small. You can do this in PHP by selecting all the items with the price less than 300 and generating the k combinations(also named nCr) of 3 and then using a filter function that returns the ones that summed together are less than 300.

$rows = $db->query("Select FirstItem.amount as amount1, SecondItem.amount as amount2, ThirdItem.amount as amount3 (.. and also the ids) from Items where amount < 300");
$ncr = getCombinations($rows, 3);
$filtered = array_filter($ncr, function($row) { return $row['amount1'] + $row['amount2'] + $row['amount3'] < 300; })
Community
  • 1
  • 1
Andre
  • 790
  • 2
  • 9
  • 23
  • When you say you would only do this if the Items table is relatively small, do you mean for all three options or just for the PHP one? – dcclassics Apr 09 '15 at 01:28
  • Depends on the current load of the MySQL server. If your specific application doesn't get many writes, chances are you would be able to get away with it. If you run explain on both the queries you'll see they are practically identical. If it were me I'd run it on PHP and store the results, then query the result set (perhaps another table) directly. If your changes are low, you could cron the operation run it at your lowest utilization time (midnight local time zone), thus amortizing your operation to just a read. – Andre Apr 09 '15 at 08:25
1

Here's a SQL only (MySQL flavour) solution:

SELECT i.*
FROM items i
CROSS JOIN
    (SELECT CONCAT('^(', t1.item_id, '|', t2.item_id, '|', t3.item_id, ')$') AS regex
     FROM items t1
     CROSS JOIN items t2
     CROSS JOIN items t3
     WHERE t1.item_id < t2.item_id
       AND t2.item_id < t3.item_id 
       AND t1.item_price + t2.item_price + t3.item_price <= 300
     ORDER BY RAND()
     LIMIT 1) s
WHERE i.item_id REGEXP s.regex

Not very efficient for large result sets as it creates a subquery of the different permutations of 3 items that fulfill the total criteria and then picks one of these at random. The subquery returns its result as a regular expression to allow the rows to be picked in the outer query.

See SQL Fiddle demo.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Steve Chambers
  • 37,270
  • 24
  • 156
  • 208
0

Another solution i get by looking Lashane's answer because each of the item's price must not be greater than the total. There should be some improvement by adding this (try EXPLAIN to the query).

SELECT t1.item_id as id1, t2.item_id as id2, t3.item_id as i3
FROM items t1, items t2, items t3
WHERE
t1.item_price <= 300 AND
t2.item_price <= 300 AND
t3.item_price <= 300 AND
t1.item_id <> t2.item_id AND
t1.item_id <> t3.item_id AND
t2.item_id <> t3.item_id AND
(t1.item_price + t2.item_price + t3.item_price) <= 300
ORDER BY RAND()
LIMIT 1
David
  • 522
  • 3
  • 7