50

I was asked this question on the interview, and I can't really understand what is going on here. The question is "What would be displayed in the console?"

#include <iostream>

int main()
{
    unsigned long long n = 0;
    ((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;

    n >>= 7*8;

    std::cout << n;
}

What is happening here, step by step?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Sergii P
  • 690
  • 8
  • 21
  • 6
    Which part is not clear to you? I am sure you can identify at least some of these. – Eugene Sh. Jul 20 '19 at 01:48
  • 1
    You don't have enough information to answer the question. Maybe that was part of the test. – Mark Ransom Jul 20 '19 at 01:57
  • 10
    The primary question is whether you're running on a big-endian or little-endian machine. If little-endian, then you'll probably get 255 (i.e. 0xff). If big-endian, then you'll probably get 0. I'm not sure if C++ guarantees that you can modify `n` like this though. – Tom Karzes Jul 20 '19 at 02:01
  • 14
    By the way, don't do stuff like this in production code without a really, really good reason. – user4581301 Jul 20 '19 at 02:11
  • 4
    That's an interview question? I hope they don't expect you to write such code on a day to day basis. – JVApen Jul 20 '19 at 06:52
  • @user4581301 Under these circumstances, people could do the most idiotic things. One of the less idiotic things would be handing in your resignation. – gnasher729 Jul 20 '19 at 14:52
  • 45
    I'd say this is a pretty disqualifiying interview question as far as the interviewer is concerned. Not only because it is, apart from including `iostream` not C++ at all, nowhere near "mid-level", nowhere near "well defined", and all in all just... full of WTF. I really wouldn't like to work with, or for someone who shows me code like this and tells me it's for judging my C++ skill. Just my opinion :) – Damon Jul 20 '19 at 15:39
  • 2
    The only answer a company should get is: I never would see such a code in production code as it is not portable and unmaintainable. They should ask for patterns, best practice, idioms but not about bit shifting and horror casts. Someone who wants to work there? – Klaus Jul 20 '19 at 19:55
  • 15
    @Damon: But maybe the real question is not the literal *""What would be displayed in the console?""* Rather, something like "Does this candidate know the difference between low-level pointer/byte fiddling in C, C++ as a better C, and idiomatic C++?". Or "Does this candidate know the C++ standard inside out and can point out what is implementation defined (not portable)/undefined behaviour and what is within the standard?" – Peter Mortensen Jul 20 '19 at 19:55
  • 1
    FWIW, Sergii -- I hope you talked through the parts you understood with your interviewer. If I asked something like this, partial credit would definitely count. It would be more about seeing your thought process and watching you work through it than whether or not you got the right answer. – A C Jul 21 '19 at 01:10
  • @Damon That really depends on what kind of code you're writing. Somebody writing low-level C++ code surely should know what little endian/big endian are and the bit fiddling here is nothing out of the ordinary. Given that the interviewer once uses sizeof and then hardcodes 7, it's also quite likely that they wanted the person to point out the mistakes/problems with the code. (Also ignoring weird non-LE/BE architectures (char size = 8 is more realistic these days really) which can be ignored by almost all people, the intentional hardcoding, what part of that code is UB?) – Voo Jul 21 '19 at 13:41
  • @Voo: I didn't claim UB, I claimed "not well defined" (given no endianness, and not knowing the size of `long long` which could e.g. be 16, it is indeterminate what the result will be). I also claimed it's full of WTFs because, well, pretty obvious I guess. But there may very well also be "true UB" in the admittedly quite unlikely case that `sizeof(long long)` is less than 8. Which is strictly allowable (it would mean is wrong, but the standard allows that, it only requires macros with some minimum values to be defined, and it _explicitly_ states they need not correspond). – Damon Jul 21 '19 at 18:48
  • @Damon As I said, given that the code does it correctly a line above, it's quite likely that that's an intentional mistake. This is a rather reasonable question for someone hiring for low-level programming imo (which yes is largely C, but there are enough people that use C++ for it, even if the merits are debatable). – Voo Jul 21 '19 at 20:12
  • See [What is the strict aliasing rule?](https://stackoverflow.com/a/51228315/1708801) – Shafik Yaghmour Jul 27 '19 at 23:00

2 Answers2

85

Let's get this one step at a time:

((char*)&n)

This casts the address of the variable n from unsigned long long* to char*. This is legal and actually accessing objects of different types via pointer of char is one of the very few "type punning" cases accepted by the language. This in effect allows you to access the memory of the object n as an array of bytes (aka char in C++)

((char*)&n)[sizeof(unsigned long long)-1]

You access the last byte of the object n. Remember sizeof returns the dimension of a data type in bytes (in C++ char has an alter ego of byte)

((char*)&n)[sizeof(unsigned long long)-1] = 0xFF;

You set the last byte of n to the value 0xFF.

Since n was 0 initially the layout memory of n is now:

00  .. 00 FF

Now notice the ... I put in the middle. That's not because I am lazy to copy paste the values the amount of bytes n has, it's because the size of unsigned long long is not set by the standard to a fixed dimension. There are some restrictions, but it can vary from implementation to implementation. So this is the first "unknown". However on most modern architectures sizeof (unsigned long long) is 8, so we are going to go with this, but in a serious interview you are expected to mention this.

The other "unknown" is how these bytes are interpreted. Unsigned integers are simply encoded in binary. But it can be little endian or big endian. x86 is little endian so we are going with it for the exemplification. And again, in a serious interview you are expected to mention this.

n >>= 7*8;

This right shifts the value of n 56 times. Pay attention, now we are talking about the value of n, not the bytes in memory. With our assumptions (size 8, little endian) the value encoded in memory is 0xFF000000 00000000 so shifting it 7*8 times will result in the value 0xFF which is 255.

So, assuming sizeof(unsigned long long) is 8 and a little endian encoding the program prints 255 to the console.


If we are talking about a big endian system, the memory layout after setting the last byte to 0xff is still the same: 00 ... 00 FF, but now the value encoded is 0xFF. So the result of n >>= 7*8; would be 0. In a big endian system the program would print 0 to the console.


As pointed out in the comments, there are other assumptions:

  • char being 8 bits. Although sizeof(char) is guaranteed to be 1, it doesn't have to have 8 bits. All modern systems I know of have bits grouped in 8-bit bytes.

  • integers don't have to be little or big endian. There can be other arrangement patterns like middle endian. Being something other than little or big endian is considered esoteric nowadays.

bolov
  • 72,283
  • 15
  • 145
  • 224
  • 4
    Technically an unsigned number could use value patterns that are neither little-endian nor big-endian, as long as individual bits have consistent positions. And apparently some old systems did use a "middle-endian" pattern in some situations, but these days yes, you'll generally find one of the two. – aschepler Jul 20 '19 at 02:48
  • 2
    Another assumption: char is 8 bit? How about 7 or 9 bit chars? – JVApen Jul 20 '19 at 07:14
  • 1
    char cannot be 7 bits (char must represent at least 255 different values), it could be 9 bits, or 16 bits. – gnasher729 Jul 20 '19 at 14:55
  • @aschepler: A system could also permute the bits of `unsigned long long` differently from the bits of char`. So far as I can tell, one could design a conforming implementation that would--without exceeding translation limits--output any number that is the sum of at most eight different discrete non-negative powers of two. Of course, one could also contrive a conforming implementation where that program would exceed some translation limit and would thus be entitled to do anything whatsoever. – supercat Jul 20 '19 at 21:11
  • Good answer! :) I had a lab on a system with 9 bit char once (around 2014 probably). It is a microcore for FPGA use, where RAM traditionally has word sizes of multiples of 9 because of an extra bit which could be used for parity (but was not in this architecture). – Jonas Schäfer Jul 21 '19 at 10:18
  • 3
    Analog devices have a line of chips aimed at audio DSP applications where sizeof (char) == sizeof (short) == sizeof (int) == 1. The smallest addressable unit of memory is 32 bits, which is perfectly valid per the language. – Dan Mills Jul 21 '19 at 11:57
7

Cast the address of n to a pointer to chars, set the 7th (assuming sizeof(long long)==8) char element to 0xff, then right-shift the result (as a long long) by 56 bits.

GaryO
  • 5,873
  • 1
  • 36
  • 61
  • -1 Because the interviewer might have been asking whether the developer knows about byte orderings. Long long is at least 8 bytes, which means that in theory it can be larger than 8 bytes. – atomsymbol Jul 30 '19 at 13:21
  • Yup, that's why I deliberately avoided saying what the result will be. Because you can't know without knowing the byte ordering. :-) – GaryO Jul 30 '19 at 13:25