I have re-created your case here locally with the following sample data model:
CREATE TABLE `cars` (
`id` int(11) NOT NULL,
`val1` varchar(45) DEFAULT NULL,
`val2` varchar(45) DEFAULT NULL,
`val3` varchar(45) DEFAULT NULL,
`val4` varchar(45) DEFAULT NULL,
`val5` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('1', 'Mercedes', 'E', 'black', '250hp', '270kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('2', 'Mercedes', 'C', 'white', '250hp', '240kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('3', 'BMW', 'C', 'white', '250hp', '250kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('4', 'PORSCHE', 'E', 'red', '300hp', '270kmh');
To get your "similarity votes" you could something like this:
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
With your test data this generates something like this:
id votes
1 3
2 1
3 1
4 4
Now comes the tricky part: We only want the "maximal votes" (best fit). The challenge here is that we need to have this votes query above twice: Once to extract the maximum, and the second time for determining the id
s associated to the maximal votes. If you only wanted the "first best match" you could use an order by votes desc limit 1
. If you want to get "all ids which have the highest votes", then you could do something like:
select * from (
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
) hits where votes = (
select max(votes) from (
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
) as hits
)
Unfortunately, this duplicates the selection query (and also needs to be computed twice). There is large discussion on how to best solve such a problem at SQL select only rows with max value on a column.
In your case I would also consider writing "similarity votes" to a temporary table (if you expect many rows to be compared). Whether this is appropriate depends on what kind of database access your application has.