1

In a system where every customer has a unique e-mail address, there is the following code to retrieve the customer id by it's e-mail:

public int? GetCustomerIdByEMail(string? email) {
   if(email==null) return null;
   return DB.Query<Customer>().Where(c=>c.EMail==email).Select(c=>c.Id).SingleOrDefault();
}

Is it considered bad practice to have the email parameter of type string? (instead of string), given that null is not considered a valid e-mail?

Having code like this spares the user of the code to write

var id = email!=null ? GetCustomerIdByEMail(email) : null

instead he can just write

var id = GetCustomerIdByEMail(email)

But personally I think it is a code smell to handle the null case like this, since null is not considered an e-mail address.

What do you people think about this problem?

Urs Meili
  • 618
  • 7
  • 19
  • I don't think that in normal use a null should ever be passed. So an error better be raised. (Or just let the program crash :-) –  Sep 08 '22 at 09:56
  • 1
    Isn't string already nullable because it's a reference type? I don't see why defining it as nullable has any additional value. – Olaf Sep 08 '22 at 09:57
  • 4
    @Olaf: Not necessarily, see [nullable reference types](https://learn.microsoft.com/en-us/dotnet/csharp/nullable-references). – Christian.K Sep 08 '22 at 09:58
  • 2
    I guess the question is whether the caller will always have a valid email address in hand. If so, `GetCustomerIdByEmail` should probably throw if it's passed a null reference. If there are cases where you really don't have an email address (and it's reasonable to not detect it earlier) then it seems reasonable. – Jon Skeet Sep 08 '22 at 09:59
  • "Having code like this spares the user of the code to write" - but that's assuming that their own `email` variable is `string?` rather than `string` which it generally shouldn't be. – Damien_The_Unbeliever Sep 08 '22 at 09:59
  • Are you asking about `string` or are you asking about it being nullable? Because it's always code smell to use `string`. – Agent_L Sep 08 '22 at 10:04
  • 3
    @Agent_L: Um, what? That's an unusual take, to say the least... – Jon Skeet Sep 08 '22 at 10:23
  • 1
    @Olaf: that's because some datatypes are "reference" by some technical story, and the fact they are, doesn't really imply whever they SHOULD/COULD be null or not. It's easy to see how an `int? limit` parameter is optional, but `string kind` doesn't tell you anything at first sight. That's why `?` and #nullable were introduced to change this, so you can rely on seeing `string?` as optional, and `string` as mandatory input. Of course, re:string, there are also string.empty, whitespaces, etc, but that's the overall thing with `?`. Another: `IFoo`, DI, and injection vs `IFoo?`, obvious thing here. – quetzalcoatl Sep 08 '22 at 10:29
  • 3
    @Agent_L: I think you may be referring here to strong domain modelling that suggests that every kind of information should get its own type. While I agree that there is a certain gain in having an Email type and FirstName type, and to not being able to accidentaly write `person.Email = importedPerson.FirstName` etc - I think this is a wholly different level of discussion than here, and just by you saying "using string is always a code smell" could be easily very well expanded to all built-in types and taken ad absurdum. And also confuse the OP. There's nothing wrong in using built-in types. – quetzalcoatl Sep 08 '22 at 10:35
  • @JonSkeet When the language has a type system, it's a waste to not use it. `Email`, `Username`, `UserID`, etc classes or structs that wrap built-in types remove many potential errors. Method that takes `string` and returns `int` is 2 errors waiting to happen. Why bother a human when a compiler can look after it for free? – Agent_L Sep 08 '22 at 10:50
  • 1
    @Agent_L: There are pros and cons to that approach - but presumably `Email` would use `string` itself - so I can't see how that's a justification for your claim that it's "always a code smell to use string". I'll leave it there though, as Stack Overflow comment threads really aren't the right place for that discussion. – Jon Skeet Sep 08 '22 at 10:57
  • 1
    @JonSkeet My first comment was too cryptic. – Agent_L Sep 08 '22 at 11:07
  • 1
    With a bit better language, we could define `Email` AS `string`, and benefit both from these types being different, and from almost perfect behavior reuse. And maybe even define that `ContactEmail` and `WorkplaceEmail` pair, and `ContactEmail` and `PersonalEmail` can be treated as synonyms, while `WorkplaceEmail` and `PersonalEmail` not. However, we have C# as it is now, and for me, defining every single tiny bit of information as a separate type is often an overkill, since such errors come mosty from copy-paste coding, and are not happening that often, or are detected easily with tests. – quetzalcoatl Sep 08 '22 at 11:08
  • @quetzalcoatl consider writing new method `void BanCustomer(string offender, string reporter)` and inside you have the choice of `GetCustomerIdByEMail` and `GetCustomerIdByUsername`. It's not just copy paste coding, its about every aspect of your design. Thinking in `strings` means you need to juggle one more piece of information in your mind, for every variable. That's why it's an error waiting to happen. Not just simple aliasing, you can e.g. make `ToString()` return masked email, so even if somebody mixes up nick and email display, a leak is contained. Tests are opt-in, types are rock-solid – Agent_L Sep 08 '22 at 11:42
  • @Agent_L I'm pretty sure I don't want to work in a team/project which relies on ToString for business purposes. The case you presented is flawed, yes, but that can be corrected diffrenty, like changing names to offenderEmail and reporterId. While I do agree that having types there to aid would be beneficial, I don't see it as beneficial enough to intro a new type and write .Value/.Text/etc everywhere when needed, and rinse&repeat for everything like Age,Year,Count,etc, and I totally disagree in one point: A 'narrowing alias' string->Email would be completely enough to solve it. Just N/A in C#. – quetzalcoatl Sep 08 '22 at 14:21

2 Answers2

2

Nullable isn't a code smell per se.

But personally I think it is a code smell to handle the null case like this, since null is not considered an e-mail address.

null doesn't represent wrong value, null represents lack of one. If a lack of an email is a normal situation, then your solution is right.

However, the real question is "is lack of an email a normal situation" and most likely the answer is "no". Don't engineer your methods to handle situations they can't handle. Make them require the email. Handle null as an exception one level up. Handling exceptional situations is literally the purpose of exceptions. And the whole point of explicit nullablilty is to get this exception check for free.

The code smell is going against "fail-fast" rule: hiding an error and continuing a code path that has no hope of succeeding.

Agent_L
  • 4,960
  • 28
  • 30
1

The purpose of functions/methods is to hide code away, and make the calling code easier to analyze by us, humans. For the computer, there'd be little difference whever the if-null check was inside or outside this function. (*)

Having that said, which calling code do you prefer to see?
For me, that's the shorter one, it's clear, uncluttered, mostly-understandable at first glance. Make it 2 or 5 such parameters, and difference in amount of IFs visible on screen is just huge (considering the calling code may be doing a series of various such get-foo-by-bar).
The fact that the result is int? not int is a 'gotcha' moment at first, but but that's something that comes with experience. You just have to spend some time to get used to any new coding style, that's all.

But calling code isn't the only code you read.

Let's now see the function itself.
Which version of the function's code, with if-null inside, or without it, would you prefer?
For me, that doesn't really matter. That single if is so trivial, it doesn't complicate that function at all. Were there 2 or 5 of them, still the same, since they'd be at the beginning, and would be repetitive, and that'd form an obvious parameter checking prologue. OTOH, if they were scattered across screen-sized function body, there'd be another thing, but as it is now, nope, no difference for me.

This and that together make me say: leave that if inside, especially if that function is called from many places (which would lead to copying that if-null to multiple places).

However, since you have doubts about this function and that parameter, I'd focus instead on why you have them. Maybe that's just that the name of GetCustomerIdByEMail suggests you something that, conceptually, doesn't particularly go well with a pass-through null behavior for you? How about changing the name to TryGetCustomerIdByEMail or MaybeGetCustomerIdByEMail(**)? Simple things like that can sometimes really make a difference.

(*) for nit-pickers, yes, yes, it MAY do a difference to the computer as well, branching, inlining, cache, and other, but I believe it's not important in this topic, since OP didn't state anything about workload size or performance.

(**) in this case, I really suggest Maybe. See this contrived example:

var id0 = MaybeGetCustomerIdByEMail(...);
var id1 = MaybeGetCustomerIdByFirstLastName(...);
var id2 = MaybeGetCustomerIdByInsuranceNumber(...);
var id3 = MaybeGetCustomerIdByAgreementNumber(...);

var ids = new[id0,id1,id2,id3].Where(id => id!=null).Distinct();
if(ids.Empty()) // throw customer not found
if(ids.Skip(1).Any()) // throw customer ambigous

var customerId = ids.Single();

By no means this is a perfect piece of code. Just a sketch. And of course your case may be vastly different. But I hope that shows why I'm expressing this, probably unpopular, view.

quetzalcoatl
  • 32,194
  • 8
  • 68
  • 107