MySQL SELECT row with maximal similarity

Question

Im have table like:

id | val1 | val2 | val3 | val4 | val5

How to find row(s) with maximum count equal vals (not all vals have to be equal).
Example: says I have 4 rows in table:

1 | Mercedes | E | black | 250hp | 270kmh | 
2 | Mercedes | C | white | 250hp | 240kmh | 
3 | BMW      | C | white | 250hp | 250kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh |

I select:

val1=PORCHE val2=E val3=red val4=250 val5=270

and get:

1 | Mercedes | E | black | 250hp | 270kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh |

because both have 3 equal fields.
Also question not about cars and I hope make this with one table.
That table about checking hardware of user and compare if it absolutely equal or how many equal it is

Welcome to Stackoverflow! Your question is thankfully compact - but perhaps too compact to be understood easily. Could you elaborate a little your case, for instance by providing a brief example what you mean with "with maximum count equal vals"? What is the metric to judge your "similarity" (e.g. method of least squares?)? — EagleRainbow, Jan 26 '20 at 10:26
Pro tip: Edit your own question to make it better instead of writing an explaining comment — EagleRainbow, Jan 26 '20 at 10:47

EagleRainbow · Answer 1 · 2020-01-26T11:18:42.450

I have re-created your case here locally with the following sample data model:

CREATE TABLE `cars` (
  `id` int(11) NOT NULL,
  `val1` varchar(45) DEFAULT NULL,
  `val2` varchar(45) DEFAULT NULL,
  `val3` varchar(45) DEFAULT NULL,
  `val4` varchar(45) DEFAULT NULL,
  `val5` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;


INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('1', 'Mercedes', 'E', 'black', '250hp', '270kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('2', 'Mercedes', 'C', 'white', '250hp', '240kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('3', 'BMW', 'C', 'white', '250hp', '250kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('4', 'PORSCHE', 'E', 'red', '300hp', '270kmh');

To get your "similarity votes" you could something like this:

select id, count(*) as votes from
(
    select id from cars where val1 = 'PORSCHE'
    union all
    select id from cars where val2 = 'E'
    union all 
    select id from cars where val3 = 'red'
    union all
    select id from cars where val4 = '250hp'
    union all
    select id from cars where val5 = '270kmh'
) as votes
group by id

With your test data this generates something like this:

id  votes
1   3
2   1
3   1
4   4

Now comes the tricky part: We only want the "maximal votes" (best fit). The challenge here is that we need to have this votes query above twice: Once to extract the maximum, and the second time for determining the ids associated to the maximal votes. If you only wanted the "first best match" you could use an order by votes desc limit 1. If you want to get "all ids which have the highest votes", then you could do something like:

select * from (
    select id, count(*) as votes from
    (
        select id from cars where val1 = 'PORSCHE'
        union all
        select id from cars where val2 = 'E'
        union all 
        select id from cars where val3 = 'red'
        union all
        select id from cars where val4 = '250hp'
        union all
        select id from cars where val5 = '270kmh'
    ) as votes
    group by id
) hits where votes = (
    select max(votes) from (
        select id, count(*) as votes from
        (
            select id from cars where val1 = 'PORSCHE'
            union all
            select id from cars where val2 = 'E'
            union all 
            select id from cars where val3 = 'red'
            union all
            select id from cars where val4 = '250hp'
            union all
            select id from cars where val5 = '270kmh'
        ) as votes
        group by id
    ) as hits
)

Unfortunately, this duplicates the selection query (and also needs to be computed twice). There is large discussion on how to best solve such a problem at SQL select only rows with max value on a column.

In your case I would also consider writing "similarity votes" to a temporary table (if you expect many rows to be compared). Whether this is appropriate depends on what kind of database access your application has.

After test I assume ur first solution is best I just add to it "order by votes desc limit 1" and get that exactly what I want ! Thanks again. I get id of row what with maximum similarity and after just select this row by it id. — user368470, Jan 26 '20 at 12:09
@user368470 . . . This is not the best solution. Unpivoting the data and re-aggregating it is a lot of unnecessary work. Both forpas and I have alternative solutions which are much more efficient. — Gordon Linoff, Jan 26 '20 at 12:33
@GordonLinoff I will try use, but im beginner in sql so this solution just easy to use compare to says forpas solution — user368470, Jan 26 '20 at 13:09
@user368470 . . . If you are learning SQL, you should learn how to do things correctly. Doing unnecessary aggregations and unpivots is not a good habit to learn. — Gordon Linoff, Jan 26 '20 at 13:20

forpas · Accepted Answer · 2020-01-26T13:40:33.883

0

This is a solution for MySql 8.0+ using rank() window function which will return the most matched rows with ties:

with cte as (
  select *,
    rank() over (order by
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
    ) rn
  from tablename  
) 
select * from cte
where rn = 1

See the demo.
And this will work for previous versions but will not return ties, just the 1st best match:

select *
from tablename 
order by 
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
limit 1

See the demo.

edited Jan 26 '20 at 13:40

answered Jan 26 '20 at 11:45

forpas

160,666
10
38
76

there is what I exactly need based on selected answer https://www.db-fiddle.com/f/sRuJsReoEW95arirXLiQJo/1 . I need select better similarity table row comprate to query. How to do same with ur solution ? Im don't understand where I input all that data in ur solution. – user368470 Jan 26 '20 at 13:28
For now yes, that's works, I absolutely don't understand how but its work. tnx for now ur solution looks a lot better than previous) – user368470 Jan 26 '20 at 13:56

score 0 · Answer 3 · answered Jan 26 '20 at 12:30

0

Based on the description, I think this would be the simplest solution:

select t.*
from (select t.*,
             rank() over (order by (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5) desc) as seqnum
      from t
     ) t
where seqnum = 1;

In versions of MySQL before version 8, this is a little more complicated, but not that bad:

select t.*
from t
where (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5)
     ) = (select max( (t2.val1 = @val1) + (t2.val2 = @val2) + (t2.val3 = @val3) + (t2.val4 = @val4) + (t2.val5 = @val5) )
          from t t2
         );

answered Jan 26 '20 at 12:30

Gordon Linoff

1,242,037
58
646
786

same question mate. there is what I exactly need based on selected answer db-fiddle.com/f/sRuJsReoEW95arirXLiQJo/1 . I need select better similarity table row comprate to query. How to do same with ur solution ? Im don't understand where I input all that data in ur solution. – user368470 Jan 26 '20 at 13:33
Same question as what? The values `@val1` and so on are where you put in the constants that you want to compare to. – Gordon Linoff Jan 26 '20 at 13:43
Same as question for forpas). For now I understand thanx a lot! – user368470 Jan 26 '20 at 13:59

MySQL SELECT row with maximal similarity

3 Answers3