2

I have a parts list with 1000 rows. The two fields are "Part Number" and "Cost". The costs range from $1 to $2000 (integers all).

  • A034, 1
  • A012, 5
  • A084, 10
  • B309, 13
  • A094, 25
  • A370, 50
  • A233, 75
  • A343, 75
  • C124, 78
  • ...
  • D239, 500
  • ...
  • X998, 1980
  • Z901, 2000

I want to create a list of all combinations of parts whose combined cost falls within a tight range (the gap of the range will never be more than $50). For example, given a range of $70-75, the returned list would be:

  • A343 (total $75)
  • A233 (total $75)
  • A370, A094 (total $75)
  • A370, B309, A084 (total $73)
  • A370, B309, A084, A034 (total $74)

My first thoughts were to iterate through all possible combinations of parts that could meet the criteria (i.e. <= the upper number of the range) and report those combinations whose sums fell within the range. It quickly became obvious that would fail because the number of combinations becomes astronomical pretty quickly. But given that most combinations would not meet the criteria, is this problem reasonably solvable?

Given that the data is in a table in a MySQL database, my preferred solution would be some SQL or Stored Proc, next preferred PHP, finally Javascript.

(ETA: the missed hit discovered by @piotrm)

biscuit314
  • 2,384
  • 2
  • 21
  • 29

3 Answers3

3

You have to limit maximum of total cost, or the number of combinations will go up to the sky no matter how you try to find them. In the following example it is limited to 75, but you may try other values to see it you can still find results in a reasonable time.

You can also adapt this solution to update combinations table on inserts or updates for your main table, letting you get results extremely quick for any range that doesnt exceed your set limit (but slowing inserts obviously as it's where all the work is done).

Create tables and trigger:

CREATE TABLE `total_max75` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `parts` varchar(255) NOT NULL,
 `num` int(11) NOT NULL,
 `total` int(11) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `total` (`total`,`num`)
); 

CREATE TABLE `newparts` (
 `name` char(4) NOT NULL,
 `price` int(11) NOT NULL,
 PRIMARY KEY (`name`)
);

DELIMITER //
CREATE TRIGGER addtotal AFTER INSERT ON newparts
FOR EACH ROW
BEGIN
IF NEW.price <= 75 THEN
   INSERT INTO total_max75 ( parts, num, total )
     SELECT CONCAT( t.parts, ', ', NEW.name), 
       t.num+1, t.total+NEW.price 
   FROM total_max75 t
   WHERE t.total <= 75 - NEW.price AND num < 40;

   INSERT INTO total_max75( parts, num, total )
     VALUES( NEW.name, 1, NEW.price );
END IF;
END//
DELIMITER ;

Then populate using:

INSERT INTO newparts( name, price )
SELECT part_number, cost FROM yourtable
WHERE cost <= 75;

or (as test data)

INSERT INTO newparts( name, price ) VALUES
('A012', 5),('A034', 1),('A084', 10),('A094', 25),('A233', 75),
('A343', 75),('A370', 50),('B309', 13),('C124', 78);

And finally get your result using:

SELECT * FROM total_max75 WHERE total BETWEEN 70 AND 75;

You may put any range here with maximum less than 75 (or whatever you set as limit in the table creation part and trigger).

Results:

A084, A370, B309        73 (you missed this one in your question)
A034, A084, A370, B309  74
A233                    75
A343                    75
A094, A370              75
piotrm
  • 12,038
  • 4
  • 31
  • 28
  • I accepted this answer because it works with the test data in the question. However, it does not solve my problem except to demonstrate that it likely cannot be solved. This is basically the first approach I considered (iterating through all combinations) adding the idea of first storing all combinations in a table. As you say, works for small sets but grows crazy as the list grows. Thanks for confirming there's no way around the factorial wall. – biscuit314 Feb 18 '12 at 16:46
2

Well, my first thought is to only self-join the rows from the table where the cost is less than the high-end of the range:

select 
  l.part as `part1`, r.part as `part2`, l.cost + r.cost as `total cost`
from (select * from parts where cost < MAX_COST) l
inner join
(select * from parts where cost < MAX_COST) r
on l.cost + r.cost between MIN_COST and MAX_COST

How efficient is this going to be?

  1. it's O(n^2) in the number of rows less than MAX_COST. This might be slow if that number isn't relatively small
  2. it's O(n) in the number of rows in parts. If parts is very large, this may be the determining factor. But if parts is not so large, or if (1) is not small, this will get swamped by (1)
Matt Fenwick
  • 48,199
  • 22
  • 128
  • 192
1

I've got recursive one (on array):

$data = array( 
  'A034' => 1,
  'A012' => 5,
  'A084' => 10,
  ...
)

We need to sort data with arsort() with the highest value first:

arsort( $data, SORT_NUMERIC);

This will make sure that the large portion of array will be dealt with in early stage. About the main function:

/**
 * Builds resulting array
 *
 * @param $array $key => $price pairs
 * @param $price lower price of interval (for <70,130>, it's 70)
 * @param $reserve difference between lower and higher price (for <70,130>, it's 130-70 = 59)
 * @param &$result output array
 * @param $cummulatedKeys so far collected keys, leave empty
 *
 * @return void
 */
function buildResults( $array, $price, $reserve, &$result, $cumulatedKeys = array()){
    // Get key of first element
    reset( $array);
    $key = key( $array);

    // Just decrease number of elements as fast as possible
    while( $one = array_shift( $array)){
       $tmp = $price - $one;

       // Ok reached one price
       if( $tmp >= 0){
           // In interval
           if( (-$tmp) <= $reserve){
               $result[] = array_merge( $cumulatedKeys, array( $key));
           } else {
                 // We are too low
                 continue;
           }
       }

       // Skip very small values which can't accumulate price
       if( (count( $array) * $one) < $tmp){
           break;
       }

       // We may go on, deeper
       buildResults( $array, $tmp, $reserve, $result, array_merge( $cumulatedKeys, array( $key)));

       // Actualize key
       if( !count( $array)){
           break;
       }
       reset( $array);
       $key = key( $array);
   }
}

Usage should be obvious but just for the case, let's say you want process $array for values interval <70,90>;

$result = array();
buildResults( $array, 70, 90-70, $result);

I haven't tested it and I'm curious about it's performance... Leave notes in comments please

Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • This didn't work - the results returned were different than expected (many combinations less than the lower threshold for example). I do appreciate the effort. – biscuit314 Feb 18 '12 at 17:03