3

I'm curious as to why my linq group by query returns 417 results whereas my SQL interpretation returns 419? I'm looking for duplicated emails from my list. I've checked out the result set and the two email addresses that are missing from the linq set both have accents. Does linq not recognize accents? Is there a workaround? The email field type is a nvarchar(100).

Let me know if you have any questions, Thanks in advance!

var listOfContacts = (from contacts in something
                      where contacts.Team.Id.Equals(TeamGuid) && !contacts.Email.Equals(null)
                      select new {contacts.Id, EmailAddress = contacts.Email.ToLower()}).ToList();

//Full Contact List; exact amount matches

var dupeEmailsList = listOfContacts
    .GroupBy(x => x.EmailAddress)
    .Where(g => g.Count() > 1)
    .Select(y => y.Key)
    .ToList();

//Returns 417
SELECT Email, COUNT(*)
FROM something
WHERE Team = 'Actual Team Guid Inserted Here'
GROUP BY Email
HAVING (COUNT(LOWER(Email)) > 1 AND Email IS NOT NULL)
ORDER BY Email

//Returns 419
kpolewaczyk
  • 45
  • 2
  • 8

4 Answers4

2

This is a known issue and the workaround has already been answered -> here and here

You have to explicitly tell it to ignore them.

Bactos
  • 1,233
  • 14
  • 26
2

This is from the links provided by @Bactos.

You just need to strip out what's called Diacritics, using built in C# normalization and CharUnicodeInfo.

You'll just have to make the call for each email address like so:

var listOfContacts = (from contacts in something
                where contacts.Team.Id.Equals(TeamGuid) && !contacts.Email.Equals(null)
                select new { contacts.Id, EmailAddress = CleanUpText(contacts.Email) }).ToList();

and the method you would need would be as follows:

private static string CleanUpText(string text)
{
    var formD = text.Normalize(NormalizationForm.FormD);
    var sb = new StringBuilder();

    foreach (var ch in formD)
    {
        var uc = CharUnicodeInfo.GetUnicodeCategory(ch);

        if (uc != UnicodeCategory.NonSpacingMark)
        {
            sb.Append(ch);
        }
    }

    return sb.ToString().Normalize(NormalizationForm.FormC).ToLower();
}

Hope that helps!

1

Because of the .ToList() in your first LINQ expression, the GROUP BY is being performed within C# on the result of Email.ToLower()

This is not at all the same as the SQL query you give, where the GROUP BY is performed on the original EMAIL column, without the ToLower(). It is not surprising that the queries return different results.

antlersoft
  • 14,636
  • 4
  • 35
  • 55
  • The result difference is greater without the ToLower(). I believe it was 402 linq to 419 SQL. Doesn't SQL automatically ignore case? I tested anyways and added LOWER and it did not change the result set in SQL. – kpolewaczyk Jun 19 '18 at 17:54
0

I think you can try to ignore NULL values in the SELECT clause.In your LINQ query you are ingnoring NULLs.

ashish
  • 2,028
  • 1
  • 10
  • 6