I am trying to write a query that will return similar rows regarding the "Name" column. My issue is that within my SQL database , there are the following examples:
NAME DOB
Doe, John 1990-01-01
Doe, John A 1990-01-01
I would like a query that returns similar, but not exact, duplicates of the "Name" column. Since I do not know exactly which patients this occurs for, I cannot just query for "Doe, John%".
I have written this query using MySQL Workbench:
SELECT
Name, DOB, id, COUNT(*)
FROM
Table
GROUP BY
DOB
HAVING
COUNT(*) > 1 ;
However, this results in an undesirable amount of results which Name is not similar at all. Is there any way I can narrow down my results to include only similar (but not exact duplicate!) Name? It seems impossible, since I do not know exactly which rows have similar Name, but I figured I'd ask some experts.
To be clear, this is not a duplicate of the other question posted, since I do not know the content of the two(or more) strings whereas that poster seemed to have known some content. Ideally, I would like to have the query limit results to rows with the first 3 or 4 characters being the same in the "Name" column. But again, I do not know the content of the strings in question. Hope this helps clarify my issue.
What I intend on doing with these results is manually auditing the rest of the information in each of the duplicate rows (over 90 other columns per row may or may not have abstract information in them that must be accurate) and then deleting the unneeded row. I would just like to get the most concise and accurate list I can to go through, so I don't have to scroll through over 10,000 rows looking for similar names.
For the record, I do know for a fact that the two rows will have exactly similar names up until the middle initial. In the past, someone used a tool that exported names from one database to my SQL database, which included middle initials. Since then, I have imported another list that does not include middle initials. I am looking for the ones that have middle initials from that subset.