How to copy 4 bytes of char buffer into long

Question

Here is an interview question I was asked:

you have the following code:

long num; char buff[50];

If buff is aligned to address 0 what is the most efficient way for num to get first 4 bytes of buff; if we want to get the value of buff in a specidic place k (buff[k]) how will we do it?

Is it pertained with memory alignment?

Regards, Ron

I'm not sure if memory alignment includes endianness, but if not then endianness is another issue to consider. — hnefatl, Nov 19 '17 at 20:18
Is that a trick interview question? Because 'the way that is most likely to be most efficient by a naive metric, i.e. on a naive compiler' and 'the single way that is not totally undefined behaviour according to the Standard' are two different things (an aliasing typecast and `memcpy`, respectively). So, I'm not certain the outfit giving that interview have their priorities straight. — underscore_d, Nov 19 '17 at 20:23
@underscore_d agree, the best interview questions invite a discussion, rather than the blocking *undefined behaviour* answer. Compare this with a conversation gambit: "Have you been here before?" The answer "No" stalls the conversation. — Weather Vane, Nov 19 '17 at 20:27
@WeatherVane Of course, it's possible that the question was a trick, to make the interviewee raise precisely these objections, but I'm a bit too cynical to be certain of that. :) — underscore_d, Nov 19 '17 at 20:29
Possible duplicate of [Converting Char array to Long in C](https://stackoverflow.com/questions/6687467/converting-char-array-to-long-in-c) — underscore_d, Nov 19 '17 at 20:30

Eric Postpischil · Answer 1 · 2017-11-19T22:51:20.500

First, understand that the question is intended to ask about characteristics beyond those specified in the C standard. The C standard does not impose requirements about efficiency, so any question asking about efficiency is necessarily asking about C implementations, not about the C standard. The interviewer is not probing your knowledge of C per se; they are probing your knowledge of modern hardware, compilers, and so on.

As mentioned in xvan’s answer, you could use *num = * (long *) buff;. This works given some assumptions implicit in the question. In order for this to work reliably:

long must not have any trap representations, or we must know that the data being copied is not a trap representation.
long must be four bytes.
The compiler must tolerate aliasing. That is, it must not assume that, because the elements of buff are char, we will not access them through a pointer to long.
buff must be four-byte aligned as stated in the question, or the target hardware must support unaligned loads.

These characteristics are not uncommon in C implementations, particularly with corresponding options selected during compilation. The result of this code is likely to be a two-instruction sequence that loads four bytes from memory to a register and that stores four bytes from a register to memory. That is the knowledge I think the interviewer was testing you for.

However, this is not a great solution. As Ilja Everilä noted in a comment, you can simply write memcpy(&num, buff, sizeof num);. This is a proper C-standard way to copy bytes, and a good compiler will optimize it. For example, I just compiled this source code using Apple LLVM 8.1.0 on macOS 10.12.6 with “-O3 -std=c11 -S” (switches that request optimization, use of the 2011 C standard, and assembly code output):

#include <stdint.h>
#include <string.h>

void foo(uint32_t *L, char *A)
{
    memcpy(L, A, sizeof *L);
}

and the resulting routine contains these instructions between the usual routine entry and exit code:

movl    (%rsi), %eax
movl    %eax, (%rdi)

Thus, the compiler has optimized the memcpy call into a load instruction and a store instruction. This is even though the compiler does not know what the alignment of buff might be. It apparently “believes” that unaligned loads and stores perform reasonably well on the target architecture, so it chose to implement the memcpy directly with load and store instructions rather than explicitly calling a library routine and looping to copy four individual bytes.

If a compiler does not immediately optimize the memcpy call like this, it may need a little help. For example, if the compiler does not know that buff is four-byte aligned, and the target hardware does not perform unaligned four-byte loads well (or at all), then the compiler will not optimize this memcpy into a load-store pair. In that case, some compilers have language extensions that let you tell them a pointer has more than the normal alignment, such as GCC’s __builtin_assume_aligned() as M.M. mentions. For example, Apple LLVM, I could do this:

typedef char AlignedBuffer[50] __attribute__((__aligned__(4)));

void foo(uint32_t *L, AlignedBuffer *A)
{
    *L = * (long *) A;
}

That typedef tells the compiler that the AlignedBuffer type is always four-byte aligned, at least. This is, of course, an extension to the C language that is not available in all compilers. (Also, when doing this, I would have to ensure to use the compiler option that supports aliasing things through pointers to other types.)

Given that this compiler already knows how to optimize this case, trying to outsmart it with pointer casting is pointless. However, when working with other compilers in other situations, something like the pointer casting may be necessary to get the performance desired. But one needs to know that this is implementation dependent, and the code should be documented as such so that other people know it cannot be ported to other C implementations without addressing these issues.

Regarding the follow-up question, one can write *num = * (long *) (buff + k);. It is likely the point of this follow-up question is to probe your knowledge of hardware alignment requirements. On many systems, attempting to load four-byte data from an address that is not four-byte-aligned causes an exception. Therefore, this assignment statement is likely to fail on such hardware when k is not a multiple of four. (Also, we should note that k must be such that all bytes to be loaded are within buff, or are otherwise known to be accessible.) The interviewer likely wanted you to display that knowledge.

Typically with interview questions like this, there is not necessarily a single right answer that the interviewer wants. Mostly, they want to see that you are aware of the issues, have some understanding of them, and have some knowledge of potential ways to address them.

Some compilers have intrinsics to "assert" that a pointer is correctly aligned , e.g. gcc [__builtin_assume_aligned()](https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) — M.M, Nov 19 '17 at 22:26
@M.M: I actually tried that with my compiler and then found it was optimizing without that, so I removed it. But I will mention it in the answer. — Eric Postpischil, Nov 19 '17 at 22:43
Your bullet list misses the alignment thing. Unaligned data can crash on even on x86, given a sufficiently smart compiler (though OP does mention alignment) :D — Antti Haapala -- Слава Україні, Nov 19 '17 at 22:43
@AnttiHaapala assuming the compiler is not bugged; if it generates `movl` for `memcpy` in a situation that accepts a `char *` with no alignment information, then it must have knowledge that an unaligned access via `movl` won't crash on that platform. — M.M, Nov 19 '17 at 22:46
@AnttiHaapala: The fact that `buff` is four-byte aligned was given in the question, so it is not an additional assumption we have to make. But I will mention it. — Eric Postpischil, Nov 19 '17 at 22:52
Though it's implicit in your (great) answer, IMO you should emphasise that when you say "The compiler must tolerate aliasing", you really mean _The compiler must be using an option that makes it non-compliant._ Such tolerance is not an implementation detail, rather an extension at best, and one that IMO is not good to rely on and indicates code that needs fixed. — underscore_d, Nov 20 '17 at 11:06
@underscore_d: Supporting aliasing does not make a C implementation non-conforming to the C standard. The standard explicitly welcomes extensions: “A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any strictly conforming program.” — Eric Postpischil, Nov 20 '17 at 12:18
@EricPostpischil I guess it depends on the meaning of "alter the behaviour": code that does not conform to aliasing rules has undefined behaviour, so does it qualify as a change in behaviour if a compiler makes its behaviour well-defined for that code? :S Probably not a very useful philosophical question. :) — underscore_d, Nov 20 '17 at 12:30
@underscore_d: A program that does not conform to the aliasing rules is not a conforming program, so whether its behavior changes is irrelevant to the condition that the behavior of conforming programs not change. The C standard **explicitly** welcomes extensions, and extensions define behavior that was previously not defined. — Eric Postpischil, Nov 20 '17 at 12:35
@underscore_d: Please stop thinking of “undefined behavior” as a thing that must be avoided. The C standard does not intend it as a wall you cannot pass. Rather, it is an open field, where the standard does not support you, but you are free to roam with other support. Essentially no useful program strictly conforms to C. Every GUI program relies on libraries and/or hardware that are not defined by C. Every program designed for high performance relies on characteristics not defined by C. The standard is a base for building on, not a constraining prison. Only academic exercises stick to pure C. — Eric Postpischil, Nov 20 '17 at 12:40

How to copy 4 bytes of char buffer into long

1 Answers1