0

I am looking for a way to create a relative relevance function when looking for values in two (or more) different tables. So I have tables like this

table1:

id weight
1  0.1
2  0.15
3  0.12
6  0.21

table2:

id weight
3  0.09
6  0.2
8  0.1
11 0.13

I need to get a relevance function from these two tables by merging them. Same row ids will get 10x relevance and rows with ids from only one table will get "weight" relevance.

Here is an intermediate table which I need to get (and my question is HOW to make such a table):

id1  weight1  id2  weight2
1    0.1      null null
2    0.15     null null
3    0.12     3    0.09
6    0.21     6    0.2
null null     8    0.1
null null     11   0.13

Using this table I can calculate the relevance whatever I need, but the problem is to create such table from these two. Could you help me?

I tried with LEFT JOIN, STRAIGHT_JOIN, LEFT OUTER JOIN, but they make very different results.

Edit: If it matters, I presently envisage the final table to look something like this:

id relevance
 1 0.1
 2 0.15
 3 2.1
 6 4.1
 8 0.1
11 0.13
Dharman
  • 30,962
  • 25
  • 85
  • 135
Epsiloncool
  • 1,435
  • 16
  • 39
  • Seeing as LEFT JOIN and LEFT OUTER JOIN are exactly the same thing, that assertion seems highly implausible – Strawberry Feb 22 '19 at 09:43
  • 1
    You need a FULL OUTER JOIN, which MySQL does not support, but which can be emulated - although one has to question the wisdom of having two tables which apparently store the same kind of thing. – Strawberry Feb 22 '19 at 09:44
  • @Strawberry These tables are subqueries actually, yes, they are taken from the same table, but with very different "where" part. How can I emulate FULL OUTER JOIN ? – Epsiloncool Feb 22 '19 at 09:54
  • Out of curiousity, what would the final result look like? – Strawberry Feb 22 '19 at 09:54
  • I think like this https://e-wm.org/i/E20190222-150905-001.png however final relevance function didn't created yet. – Epsiloncool Feb 22 '19 at 10:08
  • In that case, I suspect that this can be massively simplified – Strawberry Feb 22 '19 at 10:41
  • @Strawberry Could you give me your idea how to simplify this? – Epsiloncool Feb 22 '19 at 10:43
  • You really need to rethink your schema. Multiple tables storing the same kind of thing is a bad idea. – Strawberry Feb 22 '19 at 10:54
  • @Strawberry I have different words stored in one table and another table stores "vectors" - relativeness between words and documents. table1 (and table2) is actually a set of documents where one specific word exists. So table1 and table2 are different pieces of the same (documents) table, but related to different words. For example word "dog" can be in documents 1,2,3,6 and word "good" in docs 3,6,8,11. I need a set of docs where I have both words "good dog" and only one word appears (only "good" or only "dog"). – Epsiloncool Feb 22 '19 at 12:37
  • I can use longer requests "The fast fox jumps over the lazy dog" and in this case I will have much more document table chunks. – Epsiloncool Feb 22 '19 at 12:37
  • This sounds like a terrible approach - but hey, it's your funeral. – Strawberry Feb 22 '19 at 12:38
  • If the problem is literally as you describe, then you should probably be conducting FULLTEXT searches on the data. Otherwise, you should, at the very least, store all the words and document references in one table. – Strawberry Feb 22 '19 at 12:42
  • @Strawberry yes, I have a table of words and another table of vectors - two tables are the same for all documents. – Epsiloncool Feb 22 '19 at 13:19
  • Does this answer your question? [How to do a FULL OUTER JOIN in MySQL?](https://stackoverflow.com/questions/4796872/how-to-do-a-full-outer-join-in-mysql) – philipxy Jul 30 '20 at 22:52

4 Answers4

2

You can use FULL OUTER JOIN for this, e.g.:

SELECT t1.id AS id1, t1.weight AS weight1, t2.id AS id2, t2.weight AS weight2
FROM table1 t1 LEFT JOIN table2 t2 ON t1.id = t2.id
UNION
SELECT t1.id AS id1, t1.weight AS weight1, t2.id AS id2, t2.weight AS weight2
FROM table1 t1 RIGHT JOIN table2 t2 ON t1.id = t2.id;
Darshan Mehta
  • 30,102
  • 11
  • 68
  • 102
  • 1
    FWIW, I'd prefer to write that this way... `SELECT t1.id id1 , t1.weight weight1 , t2.id id2 , t2.weight weight2 FROM table1 t1 LEFT JOIN table2 t2 ON t2.id = t1.id UNION SELECT t1.id id1 , t1.weight weight1 , t2.id id2 , t2.weight weight2 FROM table2 t2 LEFT JOIN table1 t1 ON t1.id = t2.id;` – Strawberry Feb 22 '19 at 09:56
  • @Strawberry ah okay. Is it just about the readability or performance as well? – Darshan Mehta Feb 22 '19 at 10:24
  • Just readability - conceptually, I find a LEFT JOIN more intuitive – Strawberry Feb 22 '19 at 10:37
  • Ok I tried this, it works, however I think I will need lots of such subqueries to merge by union, since I can have 3 or even more tables (up to 8) and then I have to combine them all each with other one, right? It's 2^8=64 different subqueries... wow. – Epsiloncool Feb 22 '19 at 10:51
1
SELECT id
     , SUM(weight) * CASE WHEN COUNT(*)=1 THEN 1 ELSE 10 END relevance 
  FROM
     ( SELECT id
            , weight 
         FROM table1 
        UNION 
          ALL 
       SELECT id
            , weight 
         FROM table2
     ) x
 GROUP 
    BY id;
+----+-----------+
| id | relevance |
+----+-----------+
|  1 |      0.10 |
|  2 |      0.15 |
|  3 |      2.10 |
|  6 |      4.10 |
|  8 |      0.10 |
| 11 |      0.13 |
+----+-----------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
1

Here's a few examples :

create table Table1 (
 id int primary key not null,
 weight decimal(10,2) not null default 0
);
create table Table2 (
 id int primary key not null,
 weight decimal(10,2) not null default 0
);
insert into Table1 (id, weight) values
 (1, 0.10)
,(2, 0.15)
,(3, 0.12)
,(6, 0.21)
;
insert into Table2 (id, weight) values
 (3,  0.09)
,(6,  0.20)
,(8,  0.10)
,(11, 0.13)
;
select 
id12.id as id,
t1.id as id1,
t1.weight as weight1,
t2.id as id2,
t2.weight as weight2
from (select id from Table1 union select id from Table2) id12
left join Table1 t1 on t1.id = id12.id
left join Table2 t2 on t2.id = id12.id
;
id |  id1 | weight1 |  id2 | weight2
-: | ---: | ------: | ---: | ------:
 1 |    1 |    0.10 | null |    null
 2 |    2 |    0.15 | null |    null
 3 |    3 |    0.12 |    3 |    0.09
 6 |    6 |    0.21 |    6 |    0.20
 8 | null |    null |    8 |    0.10
11 | null |    null |   11 |    0.13
select 
id12.id as id,
coalesce(t1.weight,0) + coalesce(t2.weight,0) as relevance
from (select id from Table1 union select id from Table2) id12
left join Table1 t1 on t1.id = id12.id
left join Table2 t2 on t2.id = id12.id
order by id12.id;
id | relevance
-: | --------:
 1 |      0.10
 2 |      0.15
 3 |      0.21
 6 |      0.41
 8 |      0.10
11 |      0.13
select id, sum(weight) as relevance
from
(
 select id, weight from Table1
 union all 
 select id, weight from Table2
) q
group by id
order by id;
id | relevance
-: | --------:
 1 |      0.10
 2 |      0.15
 3 |      0.21
 6 |      0.41
 8 |      0.10
11 |      0.13

db<>fiddle here

The 2nd & 3th queries return the same result.
Which is better?
That would depend on how many extra fields and/or extra calculations are required.

LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Thx. Btw, notice that one time UNION is used and another time UNION ALL. It's important to understand the difference. `UNION` will only return unique records and remove duplicates, while `UNION ALL` merely sticks the results of the queries together. – LukStorms Feb 22 '19 at 12:48
  • I don't need for UNION at all. The initial table for table1 and table2 is the same table contains ALL documents, meanwhile table1 and table2 are just subqueries contains a selected rows from the same table. So my solutions here is replace a subquery with UNION with (select id from docs) subquery. It works good. – Epsiloncool Feb 22 '19 at 12:51
  • Oh like that. Ok, looks like you know what you're doing. :) – LukStorms Feb 22 '19 at 12:52
0

We can use stored procedures and temporary tables to get solution

CREATE PROCEDURE GetReleavance()
BEGIN
Create TEMPORARY TABLE tmpList ENGINE=MEMORY
SELECT id, weight from t1
union all 
SELECT id, weight from t2
union all 
SELECT id, weight from t3;

select id, sum(weight)* POW(10,COUNT(1)-1) as relevance 
from tmpList 
group by id;

DROP TEMPORARY TABLE IF EXISTS tmpList;

END

In the procedure creating a temporary table with all id's and weight from different table and get sum(weight) based on id.

Call the stored procedure using

CALL GetReleavance()

You can make Union all for number of table you want also it will not make major impact on performance.

balakrishnan
  • 383
  • 4
  • 12