0

I am looking into memmem, memcmp and other such functions lately to educate myself. I took the glibc source and copied the file I needed to look into. To test out, the implementation, I wrote a little main function and I use the function I am studying in that main. Since I mostly do C++, I compile my code with a C++ compiler (clang++, g++). It works and gives the correct results.

What I am wondering about is related to type aliasing. There are some places in the code where this happens:

#define op_t unsigned long int;

inline int memcmp_common_alignment(
            long int srcp1, long int srcp2, std::size_t len )
{
   op_t a0, a1;
   // some code
   a0 = ( (op_t *) srcp1 )[0];
   b0 = ( (op_t *) srcp2 )[0];
   // rest of code
}

which, from my maybe flawed understanding, is the same as:

inline int memcmp_common_alignment(
            long int srcp1, long int srcp2, std::size_t len )
{
   op_t a0, a1;
   // some code
   a0 = reinterpret_cast< op_t* >( srcp1 )[0];
   b0 = reinterpret_cast< op_t* >( srcp2 )[0];
   // rest of code
}

That appears to me to be undefined behavior in C++ because it reads the value in srcp1 and srcp2 from a different type. This is discussed, amongst other places, here and here.

I think the usual "solution" to this is to use memcpy in C++ to do the type punning. If I understand the principle properly, the idea is to copy the bits into a region of memory that is declared as the right type and access that. From the little research I've done, optimizing compilers are good at identifying this idiom and so optimize that away fairly well. Mind you, some details escape me cause I was unable to get that to work.

That said, my real question is more whether this means that a large part of the C standard library can't actually legally be compiled using a C++ compiler (i.e. compiling glibc with g++ instead of gcc)? My understanding is that the compiler could, for instance, eliminate those expressions as they invoke undefined behavior?

Probably related to this question: What is the cost of compiling a C program with a C++ compiler? But I am not sure I agree with the answer. In the case I show above, wouldn't it really be undefined behavior under C++?

ghlecl
  • 3,127
  • 1
  • 16
  • 15
  • 3
    Note: C and C++ are *very* different languages and although they share a common (shrinking) subset and some common syntax, they are *not* the same. And, some expressions are both valid C and valid C++ but have *different* semantics, so compiling C with a C++ compiler is *not* necessarily the same as compiling it with a C compiler, even if it is syntactically valid and the compile succeeds, it may very well behave differently. – Jesper Juhl Jan 08 '19 at 18:48
  • Also note: The folk writing the implementation can take advantage of all sorts of evil because they know *exactly* how that undefined behaviour will manifest on the target platform. You will often see them performing tricks that common wisdom says to avoid. – user4581301 Jan 08 '19 at 19:18

0 Answers0