I would like to find a maximally efficient way to compute a char
that contains the least significant bits of an int
in C++11. The solution must work with any possible standards-compliant compiler. (I'm using the N3290 C++ draft spec, which is essentially C++11.)
The reason for this is that I'm writing something like a fuzz tester, and want to check libraries that require a std::string
as input. So I need to generate random characters for the strings. The pseudo-random generator I'm using provides ints whose low bits are pretty uniformly random, but I'm not sure of the exact range. (Basically the exact range depends on a "size of test case" runtime parameter.)
If I didn't care about working on any compiler, this would be as simple as:
inline char int2char(int i) { return i; }
Before you dismiss this as a trivial question, consider that:
You don't know whether
char
is a signed or unsigned type.If
char
is signed, then a conversion from an unrepresentableint
to achar
is "implementation-defined" (§4.7/3). This is far better than undefined, but for this solution I'd need to see some evidence that the standard prohibits things like converting all ints not betweenCHAR_MIN
andCHAR_MAX
to'\0'
.reinterpret_cast
is not permitted between a signed and unsigned char (§5.2.10).static_cast
performs the same conversion as in the previous point.char c = i & 0xff;
--though it silences some compiler warnings--is almost certainly not correct for all implementation-defined conversions. In particular,i & 0xff
is always a positive number, so in the case thatc
is signed could quite plausibly not convert negative values ofi
to negative values ofc
.
Here are some solutions that do work, but in most of these cases I'm worried they won't be as efficient as a simple conversion. These also seem too complicated for something so simple:
Using
reinterpret_cast
on a pointer or reference, since you can convert fromunsigned char *
orunsigned char &
tochar *
orchar &
(but at the possible cost of runtime overhead).Using a union of
char
andunsigned char
, where you first assign theint
to theunsigned char
, then extract thechar
(which again could be slower).Shifting left and right to sign-extend the int. E.g., if
i
is the int, runningc = ((i << 8 * (sizeof(i) - sizeof(c)) >> 8 * (sizeof(i) - sizeof(c))
(but that's inelegant, and if the compiler doesn't optimize away the shifts, quite slow).
Here's a minimal working example. The goal is to argue that the assertions can never fail on any compiler, or to define an alternate int2char
in which the assertions can never fail.
#include <algorithm>
#include <cassert>
#include <cstdio>
#include <cstdlib>
using namespace std;
constexpr char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < min(argc, 127); n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}
I've phrased this question in terms of C++ because the standards are easier to find on the web, but I am equally interested in a solution in C. Here's the MWE in C:
#include <assert.h>
#include <stdlib.h>
static char int2char(int i) { return i; }
int
main(int argc, char **argv)
{
for (int n = 1; n < argc && n < 127; n++) {
char c = -n;
int i = (atoi(argv[n]) << 8) ^ -n;
assert(c == int2char(i));
}
return 0;
}