When an NVARCHAR
attribute in a SQL Server database contains an emoji, string functions and operators behave in different ways (default database collation is SQL_Latin1_General_CP1_CI_AS).
Behavior of string functions
Functions like LEFT()
and LEN()
see the emoji as multiple, separate characters (LEFT will cut the emoji in half and return a partial value).
DECLARE @YourString NVARCHAR(12) = N'Thank you '
SELECT @YourString, LEFT(@YourString, 11), LEN(@YourString)
The return values are "Thank you ", "Thank you �", and 12.
Behavior of operators
Operators such as UNION
, INTERSECT
, and EXISTS
treat both the emoji and the half-emoji as a single, identical value, though they return different results depending on the order the values arrive in.
Behavior of the UNION operator
UNION
treats these as identical records (since it only returns one value), but will nevertheless yield a different result depending on the order, with the bottom record being returned.
SELECT @YourString UNION SELECT LEFT(@YourString, 11)
returns "Thank you �" (the value from the bottom half)
SELECT LEFT(@YourString, 11) UNION SELECT @YourString
returns "Thank you " (the value from the top half)
Behavior of the INTERSECT and EXISTS operators
INTERSECT
and EXISTS
also treat these as identical values, but will return the value from the top record (which makes sense given the purpose of those operators, but nonetheless feels weird after seeing UNION
do the opposite).
SELECT @YourString INTERSECT SELECT LEFT(@YourString, 11)
returns "Thank you ".
SELECT LEFT(@YourString, 11) INTERSECT SELECT @YourString
returns "Thank you �".
SELECT @YourString EXCEPT SELECT LEFT(@YourString, 11)
returns no result.
Summary
- String functions such as
LEFT()
andLEN()
treat these characters as separate values. - The
UNION
operator treats these as identical values, but will nevertheless return varying results (with preference given to the value from the bottom half of the operator) - The
EXISTS
andINTERSECT
operator treats these as identical values, but will return the value from the top half
Question
Why would two different values be treated as a single, identical character by some operators (UNION, INTERSECT, EXCEPT), but be interpreted as two different values by string functions (LEFT, LEN)?
Bonus Question
Why would the UNION
operator be returning the second value it sees?