2

I am writing an algorithm to generate combinations of items from a database. They need to be unique permutations (i.e. 145, 156 == 156, 145). The problem I am running into is how to keep track of previous combinations so that i do not end up with 145, 156 and 156, 145.

Currently I am adding them to an array with index of id1_id2... (sorted so id's are always be lowest to highest) and setting the value equal to 1 when a combo is generated so that i can check if $combos[$index] exists or not. If it does not exist, create it. (there are other criteria to weed out EVERY permutation, but they are irrelevant) Once these combinations are generated, they are being stored in a table in MySQL.

The problem I am running into is that with the test items i'm using (about 85) I cannot generate a combinations with more than 3 items (id1_id2_id3) without running out of memory as the number of combinations is MASSIVE and the $combos array takes up more than the 64M i am allotted in PHP memory.

Is there a way that I can do this a) without keeping track of previous combos or b) skipping the $combos array route and only adding a unique row to mysql and let mysql handle the duplicate checking.

Here is some pseudo code for reference:

$items = array(/*85 items*/);
foreach ($items as $item1){
    generate(array($item1));
        foreach($items as $item2){
            generate(array($item1, $item2));
        }
    }
}

function generate($items_arary){
    $temp_array = array();
    foreach ($items_array as $item){
        $temp_array[] = $item['id'];
    }

    sort($temp_array);
    $index = implode("_", $temp_array);

    if (!$combos[$index]){
        $combos[$index] = 1;
        /* some code to generate query to store to db */
    }
}

the query ends up looking like this: (the database is truncated at beginning of script)

INSERT INTO `combos` (combo_id, more_info) VALUES ('id1_id2', 'Item Name');

In the process of writing this question, I thought of a possible solution: Making sure id3 > id2 > id1. Would this be a viable solution to remove the need for $combos?

helloandre
  • 10,541
  • 8
  • 47
  • 64

6 Answers6

3

The reason I asked about the before data structure is because you could do something like this:

$sql = "SELECT id FROM test_a";
$result = mysql_query($sql);
while ($row = mysql_fetch_array($result)) {
  $item1 = $row['id'];

  $sql2 = "SELECT id FROM test_a";
  $result2 = mysql_query($sql2);
  while ($row2 = mysql_fetch_array($result2)) {
    $item2 = $row2['id'];

    $combo1 = $item1 . "_" . $item2;
    $combo2 = $item2 . "_" . $item1;

    $sql3 = "SELECT * FROM combos WHERE combo_id = '$combo1' OR combo_id = '$combo2'";
    $result3 = mysql_query($sql3);
    if (mysql_num_rows($result3) == 0) {
      $sql4 = "INSERT INTO combos (combo_id, more_info) VALUES ('$combo1','Item Name')";
      $result4 = mysql_query($sql4);
    }
  }
}

When table test_a has the values 1,2,3, and 4 this script inserts: 1_1 1_2 1_3 1_4 2_2 2_3 2_4 3_3 3_4 4_4

This shouldn't have any memory problems. Although if you have a huge database you may run into a issue with php's time limit

Justin Giboney
  • 3,271
  • 2
  • 19
  • 18
1

Here is the same concept as my other answer but in an all SQL format.

INSERT INTO combos (combo_id, more_info) 
  SELECT CONCAT_WS("_",t1.id,t2.id), "item_name" 
  FROM test_a t1, test_a t2 
  WHERE NOT EXISTS (SELECT * FROM combos WHERE combo_id = CONCAT_WS("_",t1.id,t2.id))
    AND NOT EXISTS (SELECT * FROM combos WHERE combo_id = CONCAT_WS("_",t2.id,t1.id))

Assuming you can get item_name from the db somewhere, this will probably be your fastest and least memory intensive solution. I am running a test on around 1000 ids at the moment. I'll update this when it finishes.

Justin Giboney
  • 3,271
  • 2
  • 19
  • 18
  • You can eliminate the sub-selects by imposing the clause "WHERE t2.id >= t1.id" - this will only produce unique pairs. – Nick Johnson Jul 31 '09 at 08:21
0

Yes. You can store and use the lexicographical index of the combination to reconstruct/iterate them, or Grey Codes if you need to iterate all of them.

Take a look at: "Algorithm 515: Generation of a Vector from the Lexicographical Index"; Buckles, B. P., and Lybanon, M. ACM Transactions on Mathematical Software, Vol. 3, No. 2, June 1977.

I've translated into C here, and describe more here.

Community
  • 1
  • 1
nlucaroni
  • 47,556
  • 6
  • 64
  • 86
0

If you don't need to enforce referential integrity automatically (which you're not if you use string concatenation), use one table for the 85 items, give them each an index (0-84), and use a second table to represent a given set of items, using a numeric datatype where each bit position in the number represents one item. (e.g. 000001101 represents items 0, 2, and 3)

For items more than 64 you may have to split them up into more than one field, or use a BLOB or a string (gack!).

If you use this as a primary key field, you can enforce non-duplicates.

Jason S
  • 184,598
  • 164
  • 608
  • 970
0

In TSQL you can use a recursive CTE, Can''t remember where I got it, but pretty sweet. Note MYSQL doesn't use "With" option, so it won't work in MySQL

WITH Numbers(N) AS (
                    SELECT N
                    FROM ( VALUES(1), (2), (3), (4), (5), (6)) Numbers(N)),
                        Recur(N,Combination) AS (
                        SELECT N, CAST(N AS VARCHAR(20)) 
                        FROM Numbers


UNION ALL

SELECT n.N,CAST(r.Combination + ',' + CAST(n.N AS VARCHAR(10)) AS VARCHAR(20)) 
FROM Recur r
INNER JOIN Numbers n ON n.N > r.N)



select Combination
from RECUR
ORDER BY LEN(Combination),Combination;
Deno
  • 1
-1

to increase memory change

memory_limit = 512M in your php.ini
or
ini_set('memory_limit', '512M') in your php script
or
php_value memory_limit 512M in your .htaccess

Alex L
  • 8,419
  • 6
  • 43
  • 51
  • I know I can increase memory size. It will be running on a hosted server that i'm fairly positive i cannot change this value. Unless 64M is ridiculously small. – helloandre Jul 30 '09 at 16:57
  • 64 M is definitly not small ; 32 M is already considered at "quite big" by some hosting companies -- and I've never, in 3 years, worked on a "professionnal" application that required more than 32 M (except for some batches which run for hours during the night, typically) – Pascal MARTIN Jul 30 '09 at 17:29