0

Background: I am working on a system to clean up an internal customer list and figure out email addresses of contacts where we have the email address of someone else already at that company. In order to do this I have (simplified) 3 tables:

Contacts:

ID
CompanyId
Email
Domain

EmailDomains:

ID
Domain
EmailFormat
EmailFormatConfirmed

I have a manual routine which is saying, given the company, find me the next contact where we have their domain name but not their email address:

int companyId = 53;

var emails = Contacts.Where(p => p.companyId == companyId
                                            && p.Email == null
                                            && !string.IsNullOrEmpty(p.Domain)).Select(p => p.Domain);
var domain =
    EmailDomains.FirstOrDefault(
        d => !d.EmailFormatConfirmed 
            && !string.IsNullOrEmpty(d.Domain)
            && emails.Contains(d.Domain));

This query runs very slowly and on checking out the Sql generated:

    -- Region Parameters
DECLARE @p__linq__0 Int = 53
-- EndRegion
SELECT TOP (1) 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Domain] AS [Domain], 
    [Extent1].[EmailFormat] AS [EmailFormat], 
    [Extent1].[EmailFormatConfirmed] AS [EmailFormatConfirmed], 
    FROM [dbo].[EmailDomain] AS [Extent1]
    WHERE ([Extent1].[EmailFormatConfirmed] <> 1) 
        AND ( NOT (([Extent1].[Domain] IS NULL) OR ((LEN([Extent1].[Domain])) = 0))) 
        AND ( EXISTS (
            SELECT 1 AS [C1]
            FROM [dbo].[Contacts] AS [Extent2]
            WHERE ([Extent2].[CompanyId] = @p__linq__0) 
                AND ([Extent2].[Email] IS NULL) 
                    AND ( NOT (([Extent2].[Domain] IS NULL) OR ((LEN([Extent2].[Domain])) = 0))) 
                    AND (([Extent2].[Domain] = [Extent1].[Domain]) OR (([Extent2].[Domain] IS NULL) AND ([Extent1].[Domain] IS NULL)))
    ))

I can see that the offending part is OR (([Extent2].[Domain] IS NULL) AND ([Extent1].[Domain] IS NULL)) at the end of the exists clause. Why would this even be there? I can't understand how it is valid, and if I were hand coding the Sql (which currently is what I am going to have to fallback on) It wouldn't be. Am I missing something obvious? Removing this makes the query run very fast (as would be expected - there are a lot of null domains which effectively are being cross joined here)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Macros
  • 7,099
  • 2
  • 39
  • 61
  • It is necessary to handle `NULL` cases, since, in SQL, `NULL != NULL`. So this results in `value1 = value2 OR (value1 IS NULL AND value2 IS NULL)`. – Maarten Aug 23 '16 at 11:23
  • Which version of EF are you using? Maybe you can use [this](https://msdn.microsoft.com/en-us/library/system.data.objects.objectcontextoptions.usecsharpnullcomparisonbehavior.aspx), see also [this](http://stackoverflow.com/a/2541042/261050) – Maarten Aug 23 '16 at 11:25
  • @Maarten - using EF6. However I'm not sure you are getting the issue - I could write this in Sql as a CTE or Sub Query which would retrieve the contact domains for the specified company with no emails and then inner join it to the EmailDomains table. I understand null != null however I simply don't want to consider nulls at all – Macros Aug 23 '16 at 11:29

1 Answers1

0

In EF6 this behavior is controlled by the DbContextConfiguration UseDatabaseNullSemantics property (by default false):

Gets or sets a value indicating whether database null semantics are exhibited when comparing two operands, both of which are potentially nullable. The default value is false. For example (operand1 == operand2) will be translated as: (operand1 = operand2) if UseDatabaseNullSemantics is true, respectively (((operand1 = operand2) AND (NOT (operand1 IS NULL OR operand2 IS NULL))) OR ((operand1 IS NULL) AND (operand2 IS NULL))) if UseDatabaseNullSemantics is false.

So just turn it to true and the problem will be solved.

You can do that inside your DbContext constructor:

this.Configuration.UseDatabaseNullSemantics = true;

or just for a specific context instances before executing commands:

var db = new YourDbContext();
db.Configuration.UseDatabaseNullSemantics = true;
var emails = db.Contacts.Where(...)
...

But please note that turning the option on might cause some unexpected results if you compare nullable fields in the queries and don't include explicit null checks, so use it with care.

Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343