How to find rows in SQL that start with the same string (similar rows)?

Question

I have a table with primary keys that look like this:

FIRSTKEY~ABC

SECONDKEY~DEF

FIRSTKEY~DEF

I want to write a SELECT statement that strips off the segment following the tilde and returns all rows that are duplicates after the post-tilde segment is gone. That is,

SELECT ...

Gives me:

FIRSTKEY~ABC

FIRSTKEY~DEF

As "duplicates".

I already have the bit to strip off the end segment using SUBSTRING:

SELECT SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN)) FROM TABLE

This is on SQL Server.

So use that bit in your WHERE clause to check if both bits are equal, then return the values? — Konerak, Jul 20 '10 at 12:38

score 3 · Answer 1 · answered Jul 20 '10 at 12:38

3

Give this a shot

SELECT SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN)), COUNT(*) FROM TABLE 
GROUP BY SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN))
HAVING COUNT(*) > 1

answered Jul 20 '10 at 12:38

Fosco

38,138
7
87
101

This worked, but I like the answer that gives me the original rows. – Ben Vitale Jul 20 '10 at 13:42

score 0 · Accepted Answer · answered Jul 20 '10 at 12:55

The first solution given will identify the key prefixes; extend it just a bit to get the table rows beginning with those keys:

SELECT * 
FROM TABLE
WHERE SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN)) IN
(
    SELECT SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN)) FROM TABLE 
    GROUP BY SUBSTRING(COLUMN, 0, CHARINDEX('~', COLUMN))
    HAVING COUNT(*) > 1
)

Alternately, you could use a join between a temp table containing the prefixes and the original table - if the number of prefixes becomes very large, using a "where in" can become very expensive.

How to find rows in SQL that start with the same string (similar rows)?

2 Answers2

Linked