2

I have a basic search function on my site. Everything works well except for when users search for just one character of these three 'åäö' then the query returns all the items in the table. How can this be? Is the caracters translates to something else in t-sql?

Dim sanitizedstring As String = helpclass.RemoveInvalidFileNameChars(txtSearch.Text, True)

Dim getArtists = From s In dc.tbl_artists _
                 Where s.ArtistName.ToString.ToLower.Contains(sanitizedstring) _
                 Select s

EDIT : Collation of the db is Latin1_General_CI_AI.

Pac0
  • 21,465
  • 8
  • 65
  • 74
Jimmy
  • 275
  • 2
  • 6
  • 27
  • 1
    It could be related with your database _collation_ (so actually unrelated to C# code or even TSQL for that matter). Which is it ? (If you check the properties of the database / database server) – Pac0 Feb 06 '20 at 14:30
  • For starters, most databases use a case-*in*sensitive collation, which means `ToLower` isn't needed. `ToLower()` is translated to `LOWER(ArtistName)` which *prevents* the server from using any indexes that cover `ArtistName`. – Panagiotis Kanavos Feb 06 '20 at 14:34
  • 3
    What does `sanitizedstring` contain? Does it contain *anything*? If `RemoveInvalidFileNameChars` removes all non-English characters, it will be an empty string and the expression `ArtistName.Contains("")` will generate `ArtistName LIKE '%%'` condition which matches all non-null values – Panagiotis Kanavos Feb 06 '20 at 14:35
  • the only input that goes in the query is 'å', 'ä' or 'ö' and that is what sanitized string consists of. and that causes the whole table to be returned. @PanagiotisKanavos Also removed the toLower without results. – Jimmy Feb 06 '20 at 15:02
  • @Pac0 ArtistName is nvarchar – Jimmy Feb 06 '20 at 15:04
  • 1
    Post a reproducible example then - CREATE TABLE statement, sample data and output. The characters you posted aren't special in any way. – Panagiotis Kanavos Feb 06 '20 at 15:04
  • check the Server, Database and column collation by following this : https://database.guide/how-to-find-the-collation-in-sql-server-t-sql/ . The collation is something like : `SQL_Latin1_General_CP1_CI_AS`, it is not the same thing as the type of the column like `NVARCHAR(#)` – Pac0 Feb 06 '20 at 15:05
  • 1
    Latin1_General_CI_AI Is the Collation – Jimmy Feb 06 '20 at 15:10
  • 4
    In that collation `å`, `ä` and `ö` *are* equal to `a` and `o`. That's what the `AI` means, Accent Insensitive. Again, the query *won't* return all rows, it will return only those rows that contain a matching substring – Panagiotis Kanavos Feb 06 '20 at 15:13
  • Great, then theres the problem. Now i just gotta figure how to change collation to one that supports å ä ö then. Thanks – Jimmy Feb 06 '20 at 15:14
  • 1
    For your information, CI stands for "Case insensitive" and "AI" stands for "accent insensitive". You look for something with "AS" (accent sensitive) – Pac0 Feb 06 '20 at 15:23
  • 2
    `"Now i just gotta figure how to change collation to one that supports å ä ö then."` ← Think hard about this before you pull that trigger. If you had a column named `City` and you stored the value `Øslo` (Finland) that is how it would be returned to the user in either case. If a user decided to search on that name and type in `Oslo` *or* `Øslo` as the query filter value the AI collation version *would* return the result value `Øslo` **but** the non AI collation would *only* return a result if you queried on `Øslo` *with* the accent. This could lead to unexpected system behavior for your users. – Igor Feb 06 '20 at 15:31
  • 2
    ^-- (continued) - This could also affect any uniqueness constraints / indexes you have on the columns. Forcing a unique constraint on a column *without* AI collation would allow values `Øslo` *and* `Oslo` to exist in the same column as those are now 2 unique values. A column *with* AI would see that as a violation of the uniqueness constraint. – Igor Feb 06 '20 at 15:35
  • I see, thats helpful to know. – Jimmy Feb 06 '20 at 15:37

2 Answers2

2

This behavior of apparently ignoring some characters is probably due to collation.

since your collation is accent insensitive (AI), åäö would be considered by the DB the same as aao for a text query.

This probably is the source of the issue,(note that without actually knowing what is your data, we can't be certain of the diagnostic )

Solution :

Change the collation of your database to be accent sensitive (AS).

For instance, Latin1_General_CI_AS could be a sensible first test.

More info

Pac0
  • 21,465
  • 8
  • 65
  • 74
1

Are any of those characters considered valid for filenames according to RemoveInvalidFileNameChars?

If not, RemoveInvalidFileNameChars is probably returning an empty string, and Contains is always true for an empty string.

StriplingWarrior
  • 151,543
  • 27
  • 246
  • 315