In the example "match", the characters that match are in the same positions in both strings. It's not clear if this is the actual specification, or if that's just an anomaly in the example. Also, we note that in the example data, the list of characters is distinct, there aren't two of the same character in any string. Again, not sure if that's part of the specification, or an anomaly in the example.
Also, are the code values always six characters in length? Any special handling for shorter strings, or space characters? Etc.
In the simplest case, where we're comparing the strings position by position, and the only requirement is that a character be equal to another character (no special handling for spaces, or non-alphabetic, etc.) then something like this would return the specified result:
SELECT c.id
, c.code
, d.id
, d.code
FROM mytable c
JOIN mytable d
ON d.id <> c.id
AND ( IFNULL( NULLIF(SUBSTR(c.code,1,1),'') = NULLIF(SUBSTR(d.code,1,1),'') ,0)
+ IFNULL( NULLIF(SUBSTR(c.code,2,1),'') = NULLIF(SUBSTR(d.code,2,1),'') ,0)
+ IFNULL( NULLIF(SUBSTR(c.code,3,1),'') = NULLIF(SUBSTR(d.code,3,1),'') ,0)
+ IFNULL( NULLIF(SUBSTR(c.code,4,1),'') = NULLIF(SUBSTR(d.code,4,1),'') ,0)
+ IFNULL( NULLIF(SUBSTR(c.code,5,1),'') = NULLIF(SUBSTR(d.code,5,1),'') ,0)
+ IFNULL( NULLIF(SUBSTR(c.code,6,1),'') = NULLIF(SUBSTR(d.code,6,1),'') ,0)
) >= 4
WHERE c.id = 1
ORDER BY c.id, d.id
If we need to compare each character in code
to each of the characters in the other code
, we'd have something similar, we'd just need to perform a total of 36 comparisons. (compare pos 1 to pos
1,2,3,4,5,6, compare pos 2 to pos 1,2,3,4,5,6)
That could be done exactly the same as the query above, except that the AND ( ) >= 4
predicate would contain a total of 36 comparison tests, rather than just six.
That brings up (again) the issue of multiples of the same character in the string, and how those should be "counted" towards a match. For example, consider:
code1: QbQdef
code2: QxyQQz
The Q in position 1 of code1 matches three Q's in code2, and the Q in position 2 of code1 also matches three Q's in code2... for a total match count of 6. Do we want to consider these two codes as matching? If not, we could modify the test block in the query a bit, so that a match of the character in position 1 to any character in code2 would result in only 1 being added to the match count.
The specifications need to be fleshed out more, in order to make a determination of the actual SQL statement that achieves the desired result.