First, understand that the question is intended to ask about characteristics beyond those specified in the C standard. The C standard does not impose requirements about efficiency, so any question asking about efficiency is necessarily asking about C implementations, not about the C standard. The interviewer is not probing your knowledge of C per se; they are probing your knowledge of modern hardware, compilers, and so on.
As mentioned in xvan’s answer, you could use *num = * (long *) buff;
. This works given some assumptions implicit in the question. In order for this to work reliably:
long
must not have any trap representations, or we must know that the data being copied is not a trap representation.
long
must be four bytes.
The compiler must tolerate aliasing. That is, it must not assume that, because the elements of buff
are char
, we will not access them through a pointer to long
.
buff
must be four-byte aligned as stated in the question, or the target hardware must support unaligned loads.
These characteristics are not uncommon in C implementations, particularly with corresponding options selected during compilation. The result of this code is likely to be a two-instruction sequence that loads four bytes from memory to a register and that stores four bytes from a register to memory. That is the knowledge I think the interviewer was testing you for.
However, this is not a great solution. As Ilja Everilä noted in a comment, you can simply write memcpy(&num, buff, sizeof num);
. This is a proper C-standard way to copy bytes, and a good compiler will optimize it. For example, I just compiled this source code using Apple LLVM 8.1.0 on macOS 10.12.6 with “-O3 -std=c11 -S” (switches that request optimization, use of the 2011 C standard, and assembly code output):
#include <stdint.h>
#include <string.h>
void foo(uint32_t *L, char *A)
{
memcpy(L, A, sizeof *L);
}
and the resulting routine contains these instructions between the usual routine entry and exit code:
movl (%rsi), %eax
movl %eax, (%rdi)
Thus, the compiler has optimized the memcpy
call into a load instruction and a store instruction. This is even though the compiler does not know what the alignment of buff
might be. It apparently “believes” that unaligned loads and stores perform reasonably well on the target architecture, so it chose to implement the memcpy
directly with load and store instructions rather than explicitly calling a library routine and looping to copy four individual bytes.
If a compiler does not immediately optimize the memcpy
call like this, it may need a little help. For example, if the compiler does not know that buff
is four-byte aligned, and the target hardware does not perform unaligned four-byte loads well (or at all), then the compiler will not optimize this memcpy
into a load-store pair. In that case, some compilers have language extensions that let you tell them a pointer has more than the normal alignment, such as GCC’s __builtin_assume_aligned() as M.M. mentions. For example, Apple LLVM, I could do this:
typedef char AlignedBuffer[50] __attribute__((__aligned__(4)));
void foo(uint32_t *L, AlignedBuffer *A)
{
*L = * (long *) A;
}
That typedef
tells the compiler that the AlignedBuffer
type is always four-byte aligned, at least. This is, of course, an extension to the C language that is not available in all compilers. (Also, when doing this, I would have to ensure to use the compiler option that supports aliasing things through pointers to other types.)
Given that this compiler already knows how to optimize this case, trying to outsmart it with pointer casting is pointless. However, when working with other compilers in other situations, something like the pointer casting may be necessary to get the performance desired. But one needs to know that this is implementation dependent, and the code should be documented as such so that other people know it cannot be ported to other C implementations without addressing these issues.
Regarding the follow-up question, one can write *num = * (long *) (buff + k);
. It is likely the point of this follow-up question is to probe your knowledge of hardware alignment requirements. On many systems, attempting to load four-byte data from an address that is not four-byte-aligned causes an exception. Therefore, this assignment statement is likely to fail on such hardware when k
is not a multiple of four. (Also, we should note that k
must be such that all bytes to be loaded are within buff
, or are otherwise known to be accessible.) The interviewer likely wanted you to display that knowledge.
Typically with interview questions like this, there is not necessarily a single right answer that the interviewer wants. Mostly, they want to see that you are aware of the issues, have some understanding of them, and have some knowledge of potential ways to address them.