20

Context:

I'm learning C# and have been messing about on the Pex for fun site. The site challenges you to re-implement a secret algorithm, by typing code into the site and examining how the inputs and outputs differ between your implementation and the secret implementation.

Problem:

Anyway, I got stuck on a basic code duel called XAndY.

From the name it seemed obvious that the answer was just:

public static bool Puzzle(bool x, bool y) 
{
    return x && y;
}

However, this was incorrect and Pex informed me that the following inputs produced a different result than the secret implementation:

Input:

x:true y:true (0x02)

Output:

my implementation: true (0x02)

secret implementation: false

Mismatch Your puzzle method produced the wrong result.

Code: Puzzle(true, PexSafeHelpers.ByteToBoolean((byte)2));

After a lot of confusion trying to compare different types of true, I realised that the implementation that Pex was looking for was actually just using a bitwise AND:

return x & y;

Questions:

I thought that for both semantic and short-circuiting reasons you should use logical && for comparing boolean values, but regardless:

  1. Does this mean that x & y and x && y definitively do not have the same outputs for all possible bool arguments? (or could it be something buggy in Pex?)
  2. Does this mean you can differentiate between different values of bool true in C#? If so, how?
Michael Liu
  • 52,147
  • 13
  • 117
  • 150
mallardz
  • 1,070
  • 11
  • 21
  • 3
    According to what you've just printed, with an input of `true` and `true` it should output `false`. `&` doesn't do that, so it would seem that that is not the actual implementation. – Servy Jul 10 '14 at 17:49
  • In a language like C++ any non-zero value is "true", so doing a bitwise and between two different "true" values could easily return 0 (false). However, I don't think this is true in C#. Interesting question! – BradleyDotNET Jul 10 '14 at 17:50
  • 1
    @BradleyDotNET The same is true in c# except the language and compilers go to great lengths to ensure that only 0 is false and 1 is true. – jamesSampica Jul 10 '14 at 17:59
  • @Shoe - you don't know how true/false are encoded. The next C# compiler (or CLR) could do it differently. The only cross-over is in the `System.Convert` class. – H H Jul 10 '14 at 18:02
  • @HenkHolterman Well 0 will always be false no matter the compiler. I guess "true" could be whatever the compiler has it be. I am assuming CLI compliance. – jamesSampica Jul 10 '14 at 18:20
  • True=0 and False=-1 (0xff) used to be popular. – H H Jul 10 '14 at 18:33
  • 2
    When I tell people that indeed a boolean can be other than 0 or 1 it takes a back and forth of many comments to convince them. This is an unfathomable fact to many. Finally a question to point them to. – usr Jul 10 '14 at 22:38
  • @usr - booleans are still only `true` or `false`. – H H Jul 13 '14 at 13:02
  • @HenkHolterman I guess the C# compiler is not compliant with the C# language spec, then. (I'm glad this is the case because I have used the & operator on bools to reduce branches in tight loops.) – usr Jul 13 '14 at 13:12
  • @usr - the `&` is a perfectly legal boolean operator in C#. The CLR implementation is a little suspect but practical. – H H Jul 13 '14 at 13:15
  • 1
    @HenkHolterman the compiler does not provide what the language promises. The compiler needs to force each boolean to 0 or 1, or just implement & just like &&. Right now it deviates from the spec. – usr Jul 13 '14 at 13:26
  • Where do you find the representation of boolean as int in the C# specs? – H H Jul 13 '14 at 13:27
  • @HenkHolterman nowhere. But the language promises that & is a logical operator and it is easy to show that it is not. The compiler shouldn't let CLR implementation details leak through. The fact that a "corrupt" bool can come in from external code does not allow the spec to be violated. There is no "special" bool type that does not have to obey the laws.; Here's another such case where the compiler would have a really hard time ensuring compliance: http://stackoverflow.com/questions/11976969/in-c-net-why-is-sbyte-the-same-as-byte-except-that-its-not – usr Jul 13 '14 at 13:35
  • And of course the correct answer to the puzzle should be XOR `x ^ y` – Bob Vale Sep 07 '16 at 21:18
  • There is also a nice answer given [here @ codegulf](https://codegolf.stackexchange.com/q/74384/39246). What is done here with emit statements is done there with [unsafe code and pointers.](https://codegolf.stackexchange.com/a/74389/39246). – Matt Oct 15 '18 at 14:38

4 Answers4

23

The puzzle is exploiting what, in my opinion, is a bug in the C# compiler. (The bug affects VB.NET as well.)

In the C# 5.0 specification, §4.1.8 says that "The possible values of type bool are true and false", and §7.11.3 says that operator &(bool x, bool y) is a logical operator:

The result of x & y is true if both x and y are true. Otherwise, the result is false.

It's obviously a violation of the specification for true & true to yield false. What's going on?

At run time, a bool is represented by a 1-byte integer. The C# compiler uses 0 to represent false and 1 to represent true. To implement the & operator, the C# compiler emits a bitwise AND instruction in the generated IL. At first glance, this seems to be okay: bitwise AND operations involving 0 and 1 correspond exactly with logical AND operations involving false and true.

However, §III.1.1.2 of the CLI specification explicitly allows a bool to be represented by an integer other than 0 or 1:

A CLI Boolean type occupies 1 byte in memory. A bit pattern of all zeroes denotes a value of false. A bit pattern with any one or more bits set (analogous to a non-zero integer) denotes a value of true.

By going beyond the scope of C#, it is indeed possible—and perfectly legal—to create a bool whose value is, say, 2, thus causing & to behave unexpectedly. This is what the Pex site is doing.

Here's a demonstration:

using System;
using System.Reflection.Emit;

class Program
{
    static void Main()
    {
        DynamicMethod method =
            new DynamicMethod("ByteToBoolean", typeof(bool), new[] { typeof(byte) });
        ILGenerator il = method.GetILGenerator();
        il.Emit(OpCodes.Ldarg_0); // Load the byte argument...
        il.Emit(OpCodes.Ret);     // and "cast" it directly to bool.
        var byteToBoolean =
            (Func<byte, bool>)method.CreateDelegate(typeof(Func<byte, bool>));

        bool x = true;
        bool y = byteToBoolean(2);
        Console.WriteLine(x);               // True
        Console.WriteLine(y);               // True
        Console.WriteLine(x && y);          // True
        Console.WriteLine(x & y);           // False (!) because 1 & 2 == 0
        Console.WriteLine(y.Equals(false)); // False
        Console.WriteLine(y.Equals(true));  // False (!) because 2 != 1
    }
}

So the answers to your questions are:

  1. Currently, it's possible for x & y and x && y to have different values. However, this behavior violates the C# specification.
  2. Currently, you can use Boolean.Equals (as shown above) to differentiate between true values. However, this behavior violates the CLI specification of Boolean.Equals.
Michael Liu
  • 52,147
  • 13
  • 117
  • 150
  • This is correct. In IL it is even possible to write low-trust code to emit (bool)2. (The code shown here requires full trust due to the explicit layout struct.) This is not a "corrupt" bool. The CLR has not problem dealing with this. – usr Jul 10 '14 at 22:36
  • 1
    @usr: Thanks for the info. I guess it's "corrupt" from the standpoint of the C# spec, but not the CLR. – Michael Liu Jul 10 '14 at 22:37
  • In particular, PEX is passing 0xFF for true in this case. PEX is all about discovering possible failure cases, so it explores the limits of what is acceptable for input. It also has a tendency to make methods blow up by passing int.MaxValue, NaN and other fun inputs. – Dan Bryant Jul 10 '14 at 22:38
  • Excellent comprehensive answer, thank you. I guess the PEX for fun site shouldn't be using `&` instead of `&&` on boolean values in a basic C# teaching course, but on the other hand, I wouldn't have learnt this if they hadn't. – mallardz Jul 11 '14 at 09:37
  • Maybe you should note in this excellent answer that the C# compiler seems to be not compliant with the C# language spec. I hope this answer can become the canonical page to send people to for information about this issue. – usr Jul 13 '14 at 13:13
  • Very interesting answer. Using IL to force the compiler using boolean values that would otherwise not be possible (you usually can't cast from int to bool), can be regarded as an exploit - like a buffer overflow, something the designers of the language didn't think of. – Matt Aug 03 '18 at 08:22
  • 1
    When I run this code above I get `x=true` `y=true` `x&&y=false` `x&y=false` `y.Equals(false)=false` `y.Equals(true)=false`. Why? I realise this is quite an old question now so maybe something in .NET has been patched or changed? I am using .NET Framework 4.6.1 – JackPRead Jun 26 '19 at 15:41
  • 2
    @JackPRead: Interesting. At some point after I wrote this answer, the C# compiler was changed to use `&` instead of conditional jumps to implement `&&` for simple Boolean variables, so now both `x & y` and `x && y` incorrectly evaluate to false. :-( – Michael Liu Jun 26 '19 at 16:09
3

I noticed that this issue has been raised to the Roslyn compiler group. Further discussion.

It had the following resolution:

This is effectively by design and you can find the document detailing that here: https://github.com/dotnet/roslyn/blob/master/docs/compilers/Boolean%20Representation.md

You can see more details and past conversations about this here: #24652

The noted document states:

Representation of Boolean Values

The C# and VB compilers represent true (True) and false (False) bool (Boolean) values with the single byte values 1 and 0, respectively, and assume that any boolean values that they are working with are restricted to being represented by these two underlying values. The ECMA 335 CLI specification permits a "true" boolean value to be represented by any nonzero value. If you use boolean values that have an underlying representation other than 0 or 1, you can get unexpected results. This can occur in unsafe code in C#, or by interoperating with a language that permits other values. To avoid these unexpected results, it is the programmer's responsibility to normalize such incoming values.

(All italics are my own emphasis)

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
1

& in C# is not a bitwise operator, assuming that the input values are Boolean values. It is overloaded. There are two entirely separate implementations of the operator. A non-short circuiting logical boolean operator if the inputs are booleans, and a bitwise AND if the values are non-boolean values.

In the code that you have shown, then input is a boolean variable. It's not a numeric value, it's not an expression that resolves to a boolean value (which may have side effects), or anything else.

When the input is two boolean variables there is never going to be any different in the output between & and &&. The only way to have any observable difference between these two is to have a boolean expression that is more complex than just resolving a variable to its value, or some non-boolean input.

If the operands could be of some type other than bool then its pretty trivial to provide a type that has different results for either operator, such as a super mean type that overrides the true operator in a manor inconsistent with it's implicit conversion to bool:

Servy
  • 202,030
  • 26
  • 332
  • 449
  • & is emitted as a bitwise operation and an i1 at the IL level can very well contain values other than 0 and 1. You can easily write an IL function returning (bool)2. – usr Jul 10 '14 at 22:34
  • @MichaelLiu It's correct for C# code. It's not true of entirely different languages, such as IL. The question is about C#, and for C# the answer is 100% correct. – Servy Jul 11 '14 at 14:58
  • I'd call this a bug in the C# compiler. After all, C# doesn't exist in a vacuum, and because it's completely legitimate for any BCL method to return these strange Boolean values, the compiler ought to handle them properly. The current behavior is so unexpected I wouldn't be surprised if it's the cause of a security vulnerability. – Michael Liu Jul 12 '14 at 17:12
0

Since the implementation to satisfy the puzzle is entirely up to you, why dont you try:

public static bool Puzzle(bool x, bool y)
{
    return x & !y;
}
Adam White
  • 3,180
  • 1
  • 21
  • 18