Can a C compiler implement signed right shift "unreasonably"?

Question

The title field isn't long enough to a capture a detailed question, so for the record, my actual question defines "unreasonable" in a specific way:

Is it legal¹ for a C implementation have an arithmetic right shift operator that returns different results, over time, for the identical argument values? That is, must >> be a true function?

Imagine you wanted to write portable code using right-shift >> on signed values in C. Unfortunately, for you, the most efficient implementation of some critical part of your algorithm is fastest when signed right shifts fill the sign bit (i.e., they are arithmetic right shifts). Now since this behavior is implementation defined you are kind of screwed if you want to write portable code that takes advantage of it.

Just reading the compiler's documentation (which it is required to provide in the standard, but may or may not actually be easy to access or even exist) is great if you know all the compilers you will run on and will ever run on, but since that is often impossible, you might look for a way to do this portably.

One way I thought of is to simply test the compiler's behavior at runtime: it it appears to implement arithmetic² right shift, use the optimized algorithm, but if not use a fallback that doesn't rely on it. Of course, just checking that say (short)0xFF00 >> 4 == 0xFFF0 isn't enough since it doesn't exclude that perhaps char or int values work differently, or even the weird case that it fills for some shift amounts or values but not for others³.

So given that a comprehensive approach would be to exhaustively check all shift values and amounts. For all 8-bit char inputs that's only 2⁸ LHS values and 2³ RHS values for a total of 2¹¹, while short (typically, but let's say int16_t if you want to be pedantic) totals only 2²⁰, which could be validated in a fraction of a second on modern hardware⁴. Doing all 2³⁷ 32-bit values would take a few seconds on decent hardware, but still possibly reasonable. 64-bit is out for the foreseeable future however.

Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?

Now this question is mostly theoretical, but violating this probably isn't as crazy as it might seem, given the presence of hybrid architectures such as big.LITTLE that dynamically move processes from on CPU to another with possible small behavioral differences, or live VM migration between say chips manufactured by Intel and AMD with subtle (usually undocumented) instruction behavior differences.

¹ By "legal" I mean according to the C standard, not according to the laws of your country, of course.

² I.e., it replicates the sign bit for newly shifted in bits.

³ That would be kind of insane, but not that insane: x86 has different behavior (overflow flag bit setting) for shifts of 1 rather than other shifts, and ARM masks the shift amount in a surprising way that a simple test probably wouldn't detect.

⁴ At least anything better than a microcontroller. The inner validation loop is a few simple instructions, so a 1 GHz CPU could validate all ~1 million short values in ~1 ms at one instruction-per-cycle.

So you are basically asking, is the "implementation-defined" behavior consistent across the implementation? I would say it depends. I case of *pure* mathematical functions such as shifts, they should not yield *any* side-effects and their result should only be dependent on the operands values. In case of some other functions without the *pureness* restriction it might be different.. — Eugene Sh., Feb 10 '17 at 19:11
As a side note, your test expression with `short` is flawed. All operations will at least use a type as wide as `int` as their base type. A cast to `short` and then shifting buys you nothing, it is still `int`. — Jens Gustedt, Feb 10 '17 at 19:14
More or less that's what I'm asking - although my precise condition was even a bit narrower - whether the value could change within the lifetime of a single process. Note that side-effects aren't needed to having change behavior. The `>>` itself could be side-effect-free, but something else could change (not related to any `>>` invocation) that affects later calls (I suppose...). — BeeOnRope, Feb 10 '17 at 19:14
If *something* could affect a pure function result besides it's parameters, this function is not pure anymore. — Eugene Sh., Feb 10 '17 at 19:18
@JensGustedt - false. The expression `(short)0xFF00` is converted to short and then converted back up to an int. So it buys me the sign extension to int. Without it, you'd get an int with value `0x0000FF00` which makes no sense for my example. You could have always just [tried it](https://ideone.com/XZk5Gx). Is there was an `s` suffix or similar in C for `short` literals, I would have used that, but there isn't so I used a cast (which ended up slightly longer than just typing a few more Fs!). — BeeOnRope, Feb 10 '17 at 19:19
@EugeneSh. - agreed, that's other part of pure. My comment was more than the side-effect free nature may not be relevant. — BeeOnRope, Feb 10 '17 at 19:20
Though, of course I don't think C standard has a notion of pure functions... — Eugene Sh., Feb 10 '17 at 19:20
"Implementation defined does not mean "undefined". But you should take into account that you cannot shift any type smaller than 16 bits. A simple research in the standard (of which the final draft n1570 is publically available and in all relevant aspects identical to the final document) would show the definition for the bitshift operators and what "implementation defined behaviour" means and which behaviour is to be documented. Anything else is obviously not guaranteed. — too honest for this site, Feb 10 '17 at 19:21
@Olaf The OP is asking, is it possible for implementation to define that ` `(-5) >> 1` will return `10` if invoked once, and `-17` if invoked twice (just random numbers if you are wondering). — Eugene Sh., Feb 10 '17 at 19:25
@EugeneSh.: I don't see the problem. An implementation can define anything it wants. Unless it violates the standard, it is still compliant. The actual question is, if it makes sense to define such things at all, because that restricts optimisations unneccessarily. A user should never rely on this anyway. The mention of shifting `short` and `char` signals imo a fundamental missconception about (integer) operations; as I wrote: you cannot shift `char` or `short`. — too honest for this site, Feb 10 '17 at 19:30
Comments are not for extended discussion; this conversation has been [moved to chat](http://chat.stackoverflow.com/rooms/135463/discussion-on-question-by-beeonrope-can-a-c-compiler-implement-signed-right-shif). — Bhargav Rao, Feb 11 '17 at 19:40

John Bollinger · Answer 1 · 2017-02-10T19:51:02.503

3

Let's say you did that and found that the >> behavior exactly matches the desired arithmetic shift behavior. Are you safe to rely on it according to the standard: does the standard constrain the implementation not to change its behavior at runtime? That is, does the behavior have to be expressible as a function of it's inputs, or can (short)0xFF00 >> 4 be 0xFFF0 one moment and then 0x0FF0 (or any other value) the next?

The standard does not place any requirements on the form or nature of the (required) definition of the behavior of right shifting a negative integer. In particular, it does not forbid the definition to be conditional on compile-time or run-time properties beyond the operands. Indeed, this is the case for implementation-defined behavior in general. The standard defines the term simply as

unspecified behavior where each implementation documents how the choice is made.

So, for example, an implementation might provide a macro, a global variable, or a function by which to select between arithmetic and logical shifts. Implementations might also define right-shifting a negative number to do less plausible, or even wildly implausible things.

Testing the behavior is a reasonable approach, but it gets you only a probabilistic answer. Nevertheless, in practice I think it's pretty safe to perform just a small number of tests -- maybe as few as one -- for each LHS data type of interest. It is extremely unlikely that you'll run into an implementation that implements anything other than standard arithmetic (always) or logical (always) right shift, or that you'll encounter one where the choice between arithmetic and logical shifting varies for different operands of the same (promoted) type.

edited Feb 10 '17 at 19:51

answered Feb 10 '17 at 19:35

John Bollinger

160,171
8
81
157

There are some aspects of implementation defined behaviour which have to be documented and must not change for a specific implementation. See Annex J.3 . For others, this decoumentation is not required, e.g. bitshifts. – too honest for this site Feb 10 '17 at 19:38
1

@Olaf, "each implementation documents how the choice is made" is part of the standard's *definition* of "implementation-defined". All implementation-defined behavior must be documented. Annex J is informative, not normative -- it does not (cannot) limit the implementation-defined behaviors that must be documented. – John Bollinger Feb 10 '17 at 19:42
@JohnBollinger - what about exhaustive testing? I agree with everything you said and testing some arguments gives you a probabilistic answer - but you seem to imply that testing every argument would be enough since then you have completely mapped out the implementation behavior. My question is actually: even if you do _that_, could the implementation simply _define_ that the value of `>>` can change over time, with identical arguments? – BeeOnRope Feb 10 '17 at 20:09
@BeeOnRope, exhaustive testing is impossible, because you cannot know *a priori* what all the variables that need to be tested may be. Although it is unlikely to do so, the implementation *could* define the behavior of right-shifting a negative number to depend on virtually anything -- the system time, the logged-in username, time since the last system boot, the value of the CPU's IC. It may be that all your tests pass today, but some fail tomorrow. – John Bollinger Feb 10 '17 at 20:20
As I mentioned in my question, you could feasibly test _all_ possible explicit inputs, at least for a certain operand sizes. Such as all `int` inputs (e.g., all `int` values on the left hand side times all _legal_ shift values on the right hand side). If you did that, is it allowed for the implementation to use inputs _other than_ the operator arguments to determine the result? @JohnBollinger – BeeOnRope Feb 10 '17 at 20:38
@JohnBollinger: I wouldn't call it "probabilistic ". If one is using an implementation which isn't designed to be deliberately obtuse, the test will be reliable. If one is using an implementation that is designed to be deliberately obtuse, the level of effort required to do anything useful will generally exceed the effort required to find a quality implementation. – supercat Feb 10 '17 at 22:15
@BeeOnRope, for the third time, YES, the implementation is permitted to define the effect of right shifting a negative number in a way that depends on more factors than just the operands. It can define the effect ***any way at all, without restriction***. As I have also already said, however, you're *unlikely* to see that. If you're going to suppose that you can accurately determine the behavior by testing, then testing all possible operand combinations is way overkill. You only have two plausible options that you could expect to distinguish by testing. – John Bollinger Feb 11 '17 at 00:18
Thanks @John - I missed "beyond the operands" the first time through. – BeeOnRope Feb 11 '17 at 00:46

score 3 · Answer 2 · answered Feb 10 '17 at 21:45

The authors of the Standard made no effort to imagine and forbid every unreasonable way implementations might behave. In the days prior to the Standard, implementations almost always defined many behaviors based upon how the underlying platform worked, and the authors of the Standard almost certainly expected that most implementations would do likewise. Since the authors of the Standard didn't want to rule out the possibility that it might sometimes be useful for implementations to behave in other ways (e.g. to support code which targeted other platforms), however, they left many things pretty much wide open.

The Standard is somewhat vague as to the level of precision with which "Implementation Defined" behaviors need to be documented. I think the intention is that a person of reasonable intelligence reading the specification should be able to predict how a piece of code with Implementation-Defined behavior would behave, but I'm not sure that defining an operation as "yielding whatever happens to be in register 7" would be non-conforming.

From a practical perspective, what should matter is whether one is targeting quality implementations for platforms that have been developed within the last quarter century, or whether one is trying to force even deliberately-obtuse compilers to behave in controlled fashion. If the former, it may be worth using a static assert to ensure the -1>>1==-1, but quality compilers for modern hardware where that test passes are going to use arithmetic right shift consistently. While targeting code to obtuse compilers might possibly have some purpose, it's not generally possible to guard against all the ways that a pathological-but-conforming compiler could sabotage one's code, and efforts spent attempting to do so could often be more effectively spent on getting a quality compiler.

Can a C compiler implement signed right shift "unreasonably"?

2 Answers2