0

I am trying to convert a signed integer to byte array.

I tried below code. In this eg, expected result is FD 53 for the input -685(2s complement form).

int exponent = -685;

std::int64_t const exponent_range{std::labs(exponent * (int64_t)2)};
std::uint8_t const exponent_bytes{static_cast<std::uint8_t>((std::log2(static_cast<double>(exponent_range)) / 8.0) + 1.0)};

std::vector<std::uint8_t> bytes;
for (std::uint8_t i = exponent_bytes; i > 0; --i) {
    bytes.push_back(static_cast<std::uint8_t>((exponent >> ((i - 1U) * 8)) & 0xFFU));
}

In the last line I am shifting a signed integer to the right. Is this standard compliant operation or undefined behavior?

Kindly suggest any alternate standard compliant(C99 or C++11) implementation. I need a solution that would work in both byte orderings.

FaisalM
  • 724
  • 5
  • 18
  • 1
    Since C++20 signed integer are two's-complement. The ambiguity of signed shifts is therefore gone. All CPUs that have a modern C++ compiler use the same behavior for signed integer shifts, one that preserves the sign bit. I haven't dug deep into it but if the standard doesn't mandate this yet then it's probably just lagging behind the change to mandate two's-complement. – Goswin von Brederlow Jul 07 '22 at 16:27
  • I second that. Some rather verbose (and perhaps outdated) answers [here](https://stackoverflow.com/questions/4009885/arithmetic-bit-shift-on-a-signed-integer). – Paul Sanders Jul 07 '22 at 16:28
  • With C++20 the section for bitwise shift got shorter by a considerable amount: https://en.cppreference.com/w/cpp/language/operator_arithmetic – Sebastian Jul 07 '22 at 16:36
  • 1
    As for the soundness of the algorithm overall: That code seems horribly convoluted and broken. An exponent of 0 will use 0 bytes, an exponent < 128 will use 1 byte, an exponent < 32768 will use 2 bytes, an exponent < 16mil will use 3 bytes, an exponent up to 1bil will use 4 bytes and then it will cause UB, – Goswin von Brederlow Jul 07 '22 at 16:36
  • @GoswinvonBrederlow: I assume you meant for your comment to apply to signed right-shift only? Signed left-shift doesn't typically make a sign-preservation guarantee. – Ben Voigt Jul 07 '22 at 16:40
  • @BenVoigt Yes, only right shift has a logical and arithmetic variant. Left shift works the same for both signed and unsigned and will overwrite the sign bit with the next lower bit every time. But again only since C++20 because before two's-complement was mandated the bit pattern for signed integers wasn't defined. – Goswin von Brederlow Jul 07 '22 at 16:43
  • Right shift of a signed value is implementation-defined in the C standard. I doubt that will change in C2x even though signed integer types are 2's complement in C2x. Not all machines have an ASR instruction. – Ian Abbott Jul 07 '22 at 16:46
  • ... Or rather, right shift of a negative value is implementation-defined in the C standard. Right shift of non-negative signed values is well defined. – Ian Abbott Jul 07 '22 at 16:52
  • FYI, your code is C++ and not C. The C language does not have namespaces, such as `std::`. I recommend removing the C language tag, unless you insist on mixing the languages. – Thomas Matthews Jul 07 '22 at 18:19
  • @Goswin von Brederlow. Can you suggest a standard compliant algorithm to extract bytes? – FaisalM Jul 08 '22 at 02:27
  • 1
    Does this answer your question? [C++ int to byte array](https://stackoverflow.com/questions/5585532/c-int-to-byte-array) – Alexander Guyer Jul 08 '22 at 02:46
  • @Alexander Most voted solution doesn't consider the byte ordering. Accepted answer uses >> operator. So my question is still open. – FaisalM Jul 08 '22 at 03:08
  • You just said it must work in both byte orderings, which it will. Do you mean it must always be big-endian, as suggested by your example? – Useless Jul 08 '22 at 10:14
  • @Useless I expect `FD 53` for `-685` irrespective of the host system byte ordering. – FaisalM Jul 08 '22 at 10:21
  • So you *are* asking for a specific (big-endian) output byte ordering, but are refusing for some reason to specify that explicitly. Is the output format also required to be 2s complement if the host machine is not? – Useless Jul 08 '22 at 10:30
  • @FaisalM Nothing wrong with the current code from a standards viewpoint. The problem I see is your chosen output format. – Goswin von Brederlow Jul 08 '22 at 10:38
  • @Useless There are no modern C++ compilers for non two's-complement system, that's why the standard adopted two's-complement now. – Goswin von Brederlow Jul 08 '22 at 10:39
  • Sure, but OP asked specifically for C99 and/or C++11 compliance. The question should at least specify the required byte ordering and sign representation explicitly. – Useless Jul 08 '22 at 10:41
  • You want byte-level addressing. Why perform low-level bit arithmetic, introducing all the issues with bit-shifting signed integrals? **Byte-level addressing is easy and doesn't require bit arithmetic**, as seen in the most voted answer in the question I linked. So just use that solution, which preserves endianness. Then, if you want to force a specific endianness that doesn't match the system's native endianness, just reverse the byte ordering. There are robust runtime tests for detecting system endianness if you can't trust the macros and can't use `std::endian` (c++20) – Alexander Guyer Jul 08 '22 at 20:37

1 Answers1

0

I used memcpy to convert signed to unsigned integer. It looks like it will respect the byte ordering.

  int exponent = -685;

  std::int64_t const exponent_range{std::llabs(exponent * 2ll)};
  std::uint8_t const exponent_bytes{static_cast<std::uint8_t>((std::log2(static_cast<double>(exponent_range)) / 8.0) + 1.0)};

  uint32_t exponent_raw;
  std::memcpy(&exponent_raw, &exponent, sizeof(exponent));
  std::vector<std::uint8_t> bytes;
  for (std::uint8_t i = exponent_bytes; i > 0; --i) {
    bytes.push_back(static_cast<std::uint8_t>((exponent_raw >> ((i - 1U) * 8)) & 0xFFU));   
  }  
FaisalM
  • 724
  • 5
  • 18
  • If you're going for portability, I figure I should mention that fixed-width integer types are not always defined. If a system has more than 8 bits in a "byte" (addressable unit), like some modern DSPs, then it's likely that these fixed-width integer types will not be available. And since your question is about converting an `int` to an array of "bytes" (addressable units, like a `char`), assumptions about the size of a byte are unnecessary. I think treating this as a byte-level addressing problem, coupled with an endianness-checking problem, will give a cleaner solution; see my other comments. – Alexander Guyer Jul 08 '22 at 21:07