I have a table like:
| id | lastname | firstname |
| 1 | doe | john |
| 2 | oman | donald |
| 3 | o'neill | james |
| 4 | onackers | sharon |
Essentially, users are going to be searching by the first letters of the last name.
I want to be able to return results that contain and don't contain punctuation from the database. For instance, when a user searches for: on
I want to return both: o'neill, onackers
I want someone to be able to search "o, on, oneill, o neill, etc" to get o'neill.
So the best way to do this seems to take the lastname column value and have two permutations of it searched in the WHERE clause with an OR. One where any special characters are replaced with the _ in SQL, and one where all non-alpha chars (including spaces) are gone.
I figure I can use the underscore in the SQL replace to keep the one space available.
I'm having a little trouble with the WHERE clause. I'd prefer to do this with a simple REPLACE rather than creating a regex function if possible. If that's a no-go though, I understand:
@last_name (this is the nvarchar input)
SELECT id, lastname, firstname
FROM people
WHERE ((REPLACE(people.lastname, '[^A-Za-z]', '_') like @last_name + '%')
OR (REPLACE(people.lastnname,'[^A-Za-z ]', '') like @last_name + '%'))
ORDER BY lastname
I'm pretty sure the replace part has to be on the other side of the LIKE. I'm messing up the structure but need some help.
I am Using MSSQL Server 2005.
Thank you so much in advance.
UPDATE
It seems like I have two options:
- Create a regular expression function using CLR (excuse me if I'm saying this wrong, I'm new to it)
- Create extra columns on the table or create a new "fuzzyTable" with the cleaned up last names.
The database gets updated once a night. I have actually already begun the new table approach, as it was what I was originally going to do. However, I'm beginning to think it's smarter to add the "fuzzy" columns to the main table and then on the nightly update to add the adjusted lastnames to the new / updated rows.
Stack Overflow: Which approach is better? User-defined REGEX function I can use in the SQL, and thus avoid extra columns? Or adding the extra column or two to the table? Or a new table?