8

I've been doing some TDD against some existing stored procedures. They return XML, so I've been using LINQ to XML.

At the moment, I'm working on a test that will prove that the data have been sorted properly. The test navigates through the XML, and creates an IEnumerable of an anonymous type containing the three columns that should be sorted. From that, it creates a second IEnumerable by sorting the first:

var sortedColumns = from g in columns
                    orderby g.ColumnA ascending,
                            g.ColumnB ascending,
                            g.ColumnC ascending
                    select g;

Finally, it asserts that the sorted colums are the same as the unsorted ones using SequenceEquals.

The problem comes when the database collation differs from the current collation. In particular, .NET is placing "W-" before "Wa" in ColumnB.

Is there a way to sort in the same order as an arbitrary SQL Server collation? If not, then how would I sort in the same order as SQL_Latin1_General_CP1_CI_AS?

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • Related questions: http://stackoverflow.com/questions/11562042/achieving-consistent-sorting-between-c-sharp-and-sql-using-collationinfo-compare, http://stackoverflow.com/questions/9384642/what-net-stringcomparer-is-equivalent-sqls-latin1-general-ci-as, http://stackoverflow.com/questions/5581268/custom-collation-ordering – Jason Kleban Aug 27 '13 at 11:50

4 Answers4

4

If is a Windows collation, then it boils down to setting the appropriate locale and sort order, as in Sort Order Identifiers, which in .Net world goes by the currently set UI culture.

If is a SQL collation, then is a bit more complicated. If you use VARCHAR you're out of lock. For NVARCHAR, there is some hope. See Comparing SQL collations to Windows collations:

A SQL collation's rules for sorting non-Unicode data are incompatible with any sort routine that is provided by the Microsoft Windows operating system; however, the sorting of Unicode data is compatible with a particular version of the Windows sorting rules. Because the comparison rules for non-Unicode and Unicode data are different, when you use a SQL collation you might see different results for comparisons of the same characters, depending on the underlying data type. For example, if you are using the SQL collation "SQL_Latin1_General_CP1_CI_AS", the non-Unicode string 'a-c' is less than the string 'ab' because the hyphen ("-") is sorted as a separate character that comes before "b". However, if you convert these strings to Unicode and you perform the same comparison, the Unicode string N'a-c' is considered to be greater than N'ab' because the Unicode sorting rules use a "word sort" that ignores the hyphen.

For your purposes (TDD) I would recommend simply staying away from suspicious characters, like hyphen -, or having two s one after another (German ss issues), or capital I (Turkish colliton issues), sh (Spanish collation issues) etc etc... Stick to a small subset of characters that sort properly, like a,A,b,B. I'm serious.

Remus Rusanu
  • 288,378
  • 40
  • 442
  • 569
  • 1
    Unfortunately, we don't get to use clean data - I have to use what's there. – John Saunders Jan 07 '11 at 18:00
  • How about using SQLite or SQL CE for the test store? Them have collations. – Remus Rusanu Jan 07 '11 at 19:06
  • Or, "radical concept", use a "unit test database" for the unit tests, not the same database I use to test the application with in the integration environment. Hmmm. – John Saunders Jan 07 '11 at 20:13
  • sorry it's taken so long to accept your answer. You set me on the right track. All of my database-related unit tests since January have inserted test data within a transaction (possibly truncating tables first), and then rolled back at the end of the test. I'm working towards "datadude" database unit tests, which start by deploying a clean copy of the database, then populating it via data generation plans. – John Saunders Feb 18 '11 at 15:01
  • 1
    So how can I use a .Net Comparer to mimic the rules of the Windows "SQL_Latin1_General_CP1_CI_AS" collation? I'm not finding any good info on this topic (besides this answer). I don't want to have to load the sqlparser library to use it's prebuilt comparers if it's possible to avoid it and still get 100% accurate results (relative to sql server's implementation). – Jason Kleban Aug 27 '13 at 03:56
1

Write a custom IComparer implementation that actually calls out to SQL server to compare each of the two values it's asked to compare.

It would be very slow, a little silly, but it would do exactly what you're asking.

Or if you only care about replicating one particular collation sequence exactly and not any random collation, then write a one-time-use routine to call out to SQL Server and build you a collection of all characters in the order SQL server will put them and use that to build a custom IComparer implementation which will then match SQL server's sequence without calling out to SQL server at run time.

Samuel Neff
  • 73,278
  • 17
  • 138
  • 182
  • thanks for the answer. I wish I could upvote it twice. I'd mark it as the accepted answer, except that Remus Rusanu's answer actually set me on the correct path - what I think is the right way to handle all of these issues - create your own data, either in a clean database, or by doing it in a transaction and rolling back at the end of the test. – John Saunders Feb 18 '11 at 14:59
0

LINQ-to-SQL doesn't support direct use of collations. If you want to achieve a special sorting, you will have to get the data locally, transform it to a list (or whatever) and sort it yourself.

For further reference see the following two links:

MicSim
  • 26,265
  • 16
  • 90
  • 133
0

I've lifted this from use-own-icomparert-with-linq-orderby.

        MyComparer comparer = new MyComparer();
        items = items.OrderByDescending(
                  x => property.GetValue(x), comparer).ToList();
        break;

You could create a custom comparer for your needs. (You might need a concrete type instead of an anonymous type for your results - I haven't actually tested this).

Community
  • 1
  • 1
Brett
  • 1,540
  • 9
  • 13
  • Thanks, but which comparer? Is there a generic way to create a comparer that matches any SQL Server collation, or can you show me code to compare like SQL_Latin1_General_CP_CI_AI? – John Saunders Jan 07 '11 at 13:35
  • I don't believe there's an out of the box IComparer for a SQL Server collation. You'd need to create a class that implements IComparer, and make the appropriate comparisons yourself. – Brett Jan 07 '11 at 16:29
  • (cont..) I'm not saying it's going to be pretty :) Obviously, if you specifically want SQL_Latin1_General_CP1_CI_AI, you could perform a ToLower() (CI). I guess a Search/Replace for the accents (AI)? That's a bit further than I've researched... – Brett Jan 07 '11 at 16:40