Here we are comparing entries in table
CREATE TABLE a
(id INT PRIMARY KEY,
p1 INT, p2 INT, p3 INT, .. , p15 INT)
p(n) takes the value from 0 to 2
I have to get all entries with unique combinations of parameteres. It's not difficult task, so I created a table like this
CREATE TEMPORARY TABLE b AS
(SELECT
t1.id,
t2.p1, t2.p2, t2.p3, t2.p4, t2.p5, t2.p6, t2.p7, t2.p8,
t2.p9, t2.p10, t2.p11, t2.p12, t2.p13, t2.p14, t2.p15
FROM
(
SELECT
p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15
FROM
a
GROUP BY
p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11, p12, p13, p14, p15
HAVING COUNT(*) = 1
)t2
LEFT JOIN a t1 on
t2.p1 = t1.p1
AND t2.p2 = t1.p2
AND t2.p3 = t1.p3
AND t2.p4 = t1.p4
AND t2.p5 = t1.p5
AND t2.p6 = t1.p6
AND t2.p7 = t1.p7
AND t2.p8 = t1.p8
AND t2.p9 = t1.p9
AND t2.p10 = t1.p10
AND t2.p11 = t1.p11
AND t2.p12 = t1.p12
AND t2.p13 = t1.p13
AND t2.p14 = t1.p14
AND t2.p15 = t1.p15)
Here we are get enries with unique combination of parameters.
Next step is for each record in table A to find all records from the table B that differ by one, two and three parameters. Records differ by a single parameter should be no more than one, records for two different parametres no more than two etc.
For example:
id | p(n)
-----+----------------
1 |000000000000000
2 |000000000000001
I created a temporary table of the form
CREATE TEMPORARY TABLE c AS
(
SELECT
cnt, id1, id2
FROM
(
SELECT
(t1.p1 = t2.p1)+(t1.p2 = t2.p2)
+(t1.p3 = t2.p3) +(t1.p4 = t2.p4) +(t1.p5 = t2.p5)
+(t1.p6 = t2.p6) +(t1.p7 = t2.p7) +(t1.p8 = t2.p8)
+(t1.p9 = t2.p9) +(t1.p10 = t2.p10) +(t1.p11 = t2.p11)
+(t1.p12 = t2.p12) +(t1.p13 = t2.p13) +(t1.p14 = t2.p14)
+(t1.p15 = t2.p15) AS cnt,
t1.id id1,
t2.id id2
FROM
b AS t1,
a AS t2
)
WHERE
(cnt BETWEEN 12 AND 14)
AND (id1 < id2)
)
Here I get a table with pairs that differ by 1, 2 and 3 parameters
But I ran into a problem entries in the table quite a lot about 100,000 entries. This table is too large (processed data on a home PC) and the creation of the table gives a very long one.
Perhaps this is the only way to get everything, but can anyone have any idea of the analytical method of solving this problem than brute force couples (maybe not SQL). Of course, that will be solved much faster...
Any hint will be appreciated! Thank you!