106

How do I construct a SQL query (MS SQL Server) where the "where" clause is case-insensitive?

SELECT * FROM myTable WHERE myField = 'sOmeVal'

I want the results to come back ignoring the case

ashleedawg
  • 20,365
  • 9
  • 72
  • 105
Raul Agrait
  • 5,938
  • 6
  • 49
  • 72

7 Answers7

157

In the default configuration of a SQL Server database, string comparisons are case-insensitive. If your database overrides this setting (through the use of an alternate collation), then you'll need to specify what sort of collation to use in your query.

SELECT * FROM myTable WHERE myField = 'sOmeVal' COLLATE SQL_Latin1_General_CP1_CI_AS

Note that the collation I provided is just an example (though it will more than likely function just fine for you). A more thorough outline of SQL Server collations can be found here.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343
  • Just to confirm, this only needs to be added once, at the end of the `WHERE` statement, and will affect all of the `WHERE` clauses, correct? – ashleedawg May 03 '18 at 11:43
  • Like to know does your answer has any performance issue by converting a column value to `UPPER` or `LOWER` case then using the `LIKE` to search ? – Shaiju T Aug 29 '18 at 09:27
  • 1
    @ashleedawg - good question.. it appears to be a per-line setting. – Leo Gurdian Oct 09 '18 at 18:43
32

Usually, string comparisons are case-insensitive. If your database is configured to case sensitive collation, you need to force to use a case insensitive one:

SELECT balance FROM people WHERE email = 'billg@microsoft.com'
  COLLATE SQL_Latin1_General_CP1_CI_AS 
Andrejs Cainikovs
  • 27,428
  • 2
  • 75
  • 95
  • @AskeB. and Andrejs: This is not technically a database configuration issue. Please see [my answer](https://stackoverflow.com/a/54431309/577765) for clarification on string comparisons. – Solomon Rutzky Jan 31 '19 at 22:51
27

I found another solution elsewhere; that is, to use

upper(@yourString)

but everyone here is saying that, in SQL Server, it doesn't matter because it's ignoring case anyway? I'm pretty sure our database is case-sensitive.

Danny
  • 3,670
  • 12
  • 36
  • 45
  • 7
    You're correct that a database can be made case sensitive, but this is pretty inefficient, even if it is needed. COLLATE is the keyword to use. – mjaggard Jun 28 '12 at 08:00
  • 1
    Thanks for bringing that up, @mjaggard. I hope you, or anyone that seems to downvote my answer, elaborate for the good of anyone like myself who searches for and finds answers like mine. – Danny Dec 07 '12 at 20:29
  • 1
    Upvoted this as it is a perfectly rational explanation. Collate smacks of too much overhead and what if your string has characters in it that the collation doesn't understand? Latin 1 is a lousy encoding scheme. Good luck getting meaningful results if your string has an apostrophe in it (Like: O'Brien). – eggmatters Feb 15 '13 at 22:33
  • 2
    Upvoted as well. I can think of plenty of cases where this would be useful. Additionally, there is often more than one good way to do something. – Inversus Apr 12 '13 at 20:52
  • We have the same thing. I use upper for example on some user input fields to compare against DB entries, where the user might not write in capitals. (if I cannot use `LIKE`) Otherwise I can recommend to also take a look at `myField LIKE 'someValue'`. However for LIKE the DB has to be set to case insensitive (which most of them are). – skofgar Feb 10 '14 at 16:21
  • I am using this solution in a SQL function to compare a user passed string against a string I've defined as all uppercase. Simple solution! – Mike Richards Nov 06 '14 at 15:45
  • 1
    Changing the case of string for comparison purposes is generally bad. In some languages case conversions do not round-trip. i.e. LOWER(x) != LOWER(UPPER(x)). – Ceisc Nov 10 '16 at 15:25
  • 1
    @Danny In my experience, your solution will certainly work; however, it will run many times slower. – Stephen G Tuggy Feb 28 '17 at 01:44
  • @eggmatters No, this is not perfectly rational since it is provably worse than using `COLLATE`. I have no idea why you are saying that adding `COLLATE {collation_name}` to a predicate is over-engineering. That is a baseless statement. The idea is to use the case-insensitive version of the collation that the column is already using, in which case it's not possible to have any data loss (as there's no code page conversion). Please see [my answer](https://stackoverflow.com/a/54431309/577765) for clarification on string comparisons, especially point #1. – Solomon Rutzky Jan 31 '19 at 23:01
  • @Ceisc Yes, for that and also because it is slower. Please see [my answer](https://stackoverflow.com/a/54431309/577765) for clarification on string comparisons in SQL Server, especially point #1. – Solomon Rutzky Jan 31 '19 at 23:09
25

The top 2 answers (from Adam Robinson and Andrejs Cainikovs) are kinda, sorta correct, in that they do technically work, but their explanations are wrong and so could be misleading in many cases. For example, while the SQL_Latin1_General_CP1_CI_AS collation will work in many cases, it should not be assumed to be the appropriate case-insensitive collation. In fact, given that the O.P. is working in a database with a case-sensitive (or possibly binary) collation, we know that the O.P. isn't using the collation that is the default for so many installations (especially any installed on an OS using US English as the language): SQL_Latin1_General_CP1_CI_AS. Sure, the O.P. could be using SQL_Latin1_General_CP1_CS_AS, but when working with VARCHAR data, it is important to not change the code page as it could lead to data loss, and that is controlled by the locale / culture of the collation (i.e. Latin1_General vs French vs Hebrew etc). Please see point # 9 below.

The other four answers are wrong to varying degrees.

I will clarify all of the misunderstandings here so that readers can hopefully make the most appropriate / efficient choices.

  1. Do not use UPPER(). That is completely unnecessary extra work. Use a COLLATE clause. A string comparison needs to be done in either case, but using UPPER() also has to check, character by character, to see if there is an upper-case mapping, and then change it. And you need to do this on both sides. Adding COLLATE simply directs the processing to generate the sort keys using a different set of rules than it was going to by default. Using COLLATE is definitely more efficient (or "performant", if you like that word :) than using UPPER(), as proven in this test script (on PasteBin).

    There is also the issue noted by @Ceisc on @Danny's answer:

    In some languages case conversions do not round-trip. i.e. LOWER(x) != LOWER(UPPER(x)).

    The Turkish upper-case "İ" is the common example.

  2. No, collation is not a database-wide setting, at least not in this context. There is a database-level default collation, and it is used as the default for altered and newly created columns that do not specify the COLLATE clause (which is likely where this common misconception comes from), but it does not impact queries directly unless you are comparing string literals and variables to other string literals and variables, or you are referencing database-level meta-data.

  3. No, collation is not per query.

  4. Collations are per predicate (i.e. something operand something) or expression, not per query. And this is true for the entire query, not just the WHERE clause. This covers JOINs, GROUP BY, ORDER BY, PARTITION BY, etc.

  5. No, do not convert to VARBINARY (e.g.convert(varbinary, myField) = convert(varbinary, 'sOmeVal')) for the following reasons:

    1. that is a binary comparison, which is not case-insensitive (which is what this question is asking for)
    2. if you do want a binary comparison, use a binary collation. Use one that ends with _BIN2 if you are using SQL Server 2008 or newer, else you have no choice but to use one that ends with _BIN. If the data is NVARCHAR then it doesn't matter which locale you use as they are all the same in that case, hence Latin1_General_100_BIN2 always works. If the data is VARCHAR, you must use the same locale that the data is currently in (e.g. Latin1_General, French, Japanese_XJIS, etc) because the locale determines the code page that is used, and changing code pages can alter the data (i.e. data loss).
    3. using a variable-length datatype without specifying the size will rely on the default size, and there are two different defaults depending on the context where the datatype is being used. It is either 1 or 30 for string types. When used with CONVERT() it will use the 30 default value. The danger is, if the string can be over 30 bytes, it will get silently truncated and you will likely get incorrect results from this predicate.
    4. Even if you want a case-sensitive comparison, binary collations are not case-sensitive (another very common misconception).
  6. No, LIKE is not always case-sensitive. It uses the collation of the column being referenced, or the collation of the database if a variable is compared to a string literal, or the collation specified via the optional COLLATE clause.

  7. LCASE is not a SQL Server function. It appears to be either Oracle or MySQL. Or possibly Visual Basic?

  8. Since the context of the question is comparing a column to a string literal, neither the collation of the instance (often referred to as "server") nor the collation of the database have any direct impact here. Collations are stored per each column, and each column can have a different collation, and those collations don't need to be the same as the database's default collation or the instance's collation. Sure, the instance collation is the default for what a newly created database will use as its default collation if the COLLATE clause wasn't specified when creating the database. And likewise, the database's default collation is what an altered or newly created column will use if the COLLATE clause wasn't specified.

  9. You should use the case-insensitive collation that is otherwise the same as the collation of the column. Use the following query to find the column's collation (change the table's name and schema name):

    SELECT col.*
    FROM   sys.columns col
    WHERE  col.[object_id] = OBJECT_ID(N'dbo.TableName')
    AND    col.[collation_name] IS NOT NULL;
    

    Then just change the _CS to be _CI. So, Latin1_General_100_CS_AS would become Latin1_General_100_CI_AS.

    If the column is using a binary collation (ending in _BIN or _BIN2), then find a similar collation using the following query:

    SELECT *
    FROM   sys.fn_helpcollations() col
    WHERE  col.[name] LIKE N'{CurrentCollationMinus"_BIN"}[_]CI[_]%';
    

    For example, assuming the column is using Japanese_XJIS_100_BIN2, do this:

    SELECT *
    FROM   sys.fn_helpcollations() col
    WHERE  col.[name] LIKE N'Japanese_XJIS_100[_]CI[_]%';
    

For more info on collations, encodings, etc, please visit: Collations Info

Solomon Rutzky
  • 46,688
  • 9
  • 128
  • 171
8

No, only using LIKE will not work. LIKE searches values matching exactly your given pattern. In this case LIKE would find only the text 'sOmeVal' and not 'someval'.

A pracitcable solution is using the LCASE() function. LCASE('sOmeVal') gets the lowercase string of your text: 'someval'. If you use this function for both sides of your comparison, it works:

SELECT * FROM myTable WHERE LCASE(myField) LIKE LCASE('sOmeVal')

The statement compares two lowercase strings, so that your 'sOmeVal' will match every other notation of 'someval' (e.g. 'Someval', 'sOMEVAl' etc.).

David Hermanns
  • 395
  • 3
  • 12
  • 7
    In 99.9% of the SQL Server installations which are collated _CI, LIKE is Case Insensitive. – RichardTheKiwi Oct 02 '12 at 09:30
  • 1
    Nowadays the function is called LOWER – David Brossard Jan 02 '19 at 20:44
  • @DavidBrossard and David Hermanns, I don't think it was ever `LCASE()` in SQL Server (at least not that I can see). I think this answer is for an entirely different RDBMS. Please see [my answer](https://stackoverflow.com/a/54431309/577765) for clarification on string comparisons. – Solomon Rutzky Jan 31 '19 at 22:50
4

You can force the case sensitive, casting to a varbinary like that:

SELECT * FROM myTable 
WHERE convert(varbinary, myField) = convert(varbinary, 'sOmeVal')
2

What database are you on? With MS SQL Server, it's a database-wide setting, or you can over-ride it per-query with the COLLATE keyword.

Chase Seibert
  • 15,703
  • 8
  • 51
  • 58
  • Hi there. For SQL Server, in terms of what this question is about, it is neither a database-wide setting nor per-query. Please see [my answer](https://stackoverflow.com/a/54431309/577765) for details. – Solomon Rutzky Jan 31 '19 at 22:45