2

I have a number stored as a ulong. I want the bits stored in memory to be interpreted in a 2's complement fashion. So I want the first bit to be the sign bit etc. If I want to convert to a long, so that the number is interpreted correctly as a 2's complement , how do I do this?

I tried creating pointers of different data types that all pointed to the same buffer. I then stored the ulong into the buffer. I then dereferenced a long pointer. This however is giving me a bad result?

I did :

#include <iostream>
using namespace std;

int main() {
    unsigned char converter_buffer[4];//  

    unsigned long       *pulong;
    long                *plong;


    pulong = (unsigned long*)&converter_buffer;
    plong  =  (long*)&converter_buffer;

    unsigned long ulong_num = 65535; // this has a 1 as the first bit

    *pulong = ulong_num;

    std:: cout << "the number as a long is" << *plong << std::endl;
    return 0;
}

For some reason this is giving me the same positive number. Would casting help?

Ariel Baron
  • 331
  • 4
  • 13
  • 1
    The ulong is either 32 or 64 bits, so the 1st bit is 0. – stark Apr 20 '17 at 18:19
  • so if i shifted the first bit to the 31'st spot then it might work? – Ariel Baron Apr 20 '17 at 18:21
  • @stark: Can you provide a reference to the standard supporting this? From the minimum range, `(unsigned) long` must be **at least** 32 bits. But there is no other restriction about its width. – too honest for this site Apr 20 '17 at 18:27
  • To find out the size, you can do `sizeof unsigned long` and shift that * 8. – stark Apr 20 '17 at 18:29
  • 1
    Why the weird indentation? – Lightness Races in Orbit Apr 20 '17 at 18:40
  • How about the obvious: instead of reinterpret_cast and unions, just check if first bit is one (std::numeric_limits::max / 2 is what you compare against); then either trivially cast or take 2s comp. and reconstruct sign? Optimizer will take care of the rest and is UB-free, even IB-free. – lorro Apr 20 '17 at 20:44

3 Answers3

3

Actually using pointers was a good start but you have to cast your unsigned long* to void* first, then you can cast the result to long* and dereference it:

#include <iostream>
#include <climits>

int main() {
    unsigned long ulongValue = ULONG_MAX;
    long longValue = *((long*)((void*)&ulongValue));

    std::cout << "ulongValue: " << ulongValue << std::endl;
    std::cout << "longValue:  " << longValue << std::endl;

    return 0;
}

The code above will results the following:

ulongValue: 18446744073709551615
longValue:  -1

With templates you can make it more readable in your code:

#include <iostream>
#include <climits>

template<typename T, typename U>
T unsafe_cast(const U& from) {
    return *((T*)((void*)&from));
}

int main() {
    unsigned long ulongValue = ULONG_MAX;
    long longValue = unsafe_cast<long>(ulongValue);

    std::cout << "ulongValue: " << ulongValue << std::endl;
    std::cout << "longValue:  " << longValue << std::endl;

    return 0;
}

Keep in mind that this solution is absolutely unsafe due to the fact that you can cast anyithing to void*. This practicle was common in C but I do not recommend to use it in C++. Consider the following cases:

#include <iostream>

template<typename T, typename U>
T unsafe_cast(const U& from) {
    return *((T*)((void*)&from));
}

int main() {
    std::cout << std::hex << std::showbase;

    float fValue = 3.14;
    int iValue = unsafe_cast<int>(fValue); // OK, they have same size.

    std::cout << "Hexadecimal representation of " << fValue
              << " is: " << iValue << std::endl;
    std::cout << "Converting back to float results: "
              << unsafe_cast<float>(iValue) << std::endl;

    double dValue = 3.1415926535;
    int lossyValue = unsafe_cast<int>(dValue); // Bad, they have different size.

    std::cout << "Lossy hexadecimal representation of " << dValue
              << " is: " << lossyValue << std::endl;
    std::cout << "Converting back to double results: "
              << unsafe_cast<double>(lossyValue) << std::endl;

    return 0;
}

The code above results for me the following:

Hexadecimal representation of 3.14 is: 0x4048f5c3
Converting back to float results: 3.14
Lossy hexadecimal representation of 3.14159 is: 0x54411744
Converting back to double results: 6.98387e-315

And for last line you can get anything because the conversion will read garbage from the memory.

Edit

As lorro commented bellow, using memcpy() is safer and can prevent the overflow. So, here is another version of type casting which is safer:

template<typename T, typename U>
T safer_cast(const U& from) {
    T to;
    memcpy(&to, &from, (sizeof(T) > sizeof(U) ? sizeof(U) : sizeof(T)));
    return to;
}
Community
  • 1
  • 1
Akira
  • 4,385
  • 3
  • 24
  • 46
  • This works on x86 / x64 (Itanium ABI), but uses the implementation-specific fact that you're allowed to basically reinterpret_cast<> between these types. It's less liky to break than union. If you want additional safety, you might do a memcpy() between the two. Also note that signed int overflow is UB (and it happens), so it's best to check range. – lorro Apr 20 '17 at 20:30
  • So the verdict is that as long as the data types that I am trying to convert between are of the same size, then a traditional cast or the method with pointers you just showed, should work? – Ariel Baron Apr 20 '17 at 20:56
  • Please explain why you think `(long*)((void*)&ulongValue)` is an improvement over `(long*)&ulongValue` ? – M.M Apr 20 '17 at 22:33
0

You can do this:

uint32_t u;
int32_t& s = (int32_t&) u;

Then you can use s and u interchangeably with 2's complement, e.g.:

s = -1;
std::cout << u << '\n';     // 4294967295

In your question you ask about 65535 but that is a positive number. You could do:

uint16_t u;
int16_t& s = (int16_t&) u;

u = 65535;
std::cout << s << '\n';    // -1

Note that assigning 65535 (a positive number) to int16_t would implementation-defined behaviour, it does not necessarily give -1.

The problem with your original code is that it is not permitted to alias a char buffer as long. (And that you might overflow your buffer). However, it is OK to alias an integer type as its corresponding signed/unsigned type.

M.M
  • 138,810
  • 21
  • 208
  • 365
-1

In general, when you have two arithmetic types that are the same size and you want to reinterpret the bit representation of one using the type of the other, you do it with a union:

#include <stdint.h>

union reinterpret_u64_d_union {
    uint64_t u64;
    double   d;
};

double
reinterpret_u64_as_double(uint64_t v)
{
    union reinterpret_u64_d_union u;
    u.u64 = v;
    return u.d;
}

For the special case of turning an unsigned number into a signed type with the same size (or vice versa), however, you can just use a traditional cast:

int64_t
reinterpret_u64_as_i64(uint64_t v)
{
    return (int64_t)v;
}

(The cast is not strictly required for [u]int64_t, but if you don't explicitly write a cast, and the types you're converting between are small, the "integer promotions" may get involved, which is usually undesirable.)

The way you were trying to do it violates the pointer-aliasing rules and provokes undefined behavior.

In C++, note that reinterpret_cast<> does not do what the union does; it is the same as static_cast<> when applied to arithmetic types.

In C++, also note that the use of a union above relies on a rule in the 1999 C standard (with corrigienda) that has not been officially incorporated into the C++ standard last I checked; however, all compilers I am familiar with will do what you expect.

And finally, in both C and C++, long and unsigned long are guaranteed to be able to represent at least −2,147,483,647 ... 214,7483,647 and 0 ... 4,294,967,295, respectively. Your test program used 65535, which is guaranteed to be representable by both long and unsigned long, so the value would have been unchanged however you did it. Well, unless you used invalid pointer aliasing and the compiler decided to make demons fly out of your nose instead.

zwol
  • 135,547
  • 38
  • 252
  • 361
  • 2
    This is C++, not C. AFAIK typecasting via `union` is not allowed in C++. – too honest for this site Apr 20 '17 at 18:29
  • 1
    @Olaf Note the second-to-last paragraph. The question was tagged both C and C++ when I answered it, which I understood to mean that the questioner wanted to know about both languages and the differences, if any. I wish the hivemind would stop insisting that no question can be tagged that way. – zwol Apr 20 '17 at 18:31
  • The question was tagged both in apparent missconception they are the same language or something like C/C++ exists. This is a common problem with newbies and should be clarified by a comment or reviewing the text. The cdoe is C++, so we can asume C++. Anyway, "the compilers behave like that" is a bad basis to rely on. This can change with the next release and already requires you use the same version/compiler as OP. Please keep in mind questions&answers are read years later; even if the same standard is applicable, it is a bad idea to rely on this. That's all I wanted to have emphasised more. – too honest for this site Apr 20 '17 at 18:36
  • @Olaf I prefer to assume that questioners know what they are doing with the tags, when there is a sensible interpretation (such as "tell me about the differences between the languages wrt this question"). – zwol Apr 20 '17 at 18:38
  • ok so let's say I want to do this in c++, the union method, will not work is my understanding. What will work then? – Ariel Baron Apr 20 '17 at 18:41
  • @user7511696 Use a traditional cast. But _in practice_ the union method will also work in C++. Olaf is being excessively pedantic in my opinion. – zwol Apr 20 '17 at 18:53
  • @zwol - It really depends on the compiler, see http://stackoverflow.com/questions/16637074/c-unions-vs-reinterpret-cast – Donnie Apr 20 '17 at 19:02
  • This is UB in C++. With a (very) few exceptions, the rule is that union is not for casting. Even casting is UB; pedantic would be to require memcpy(). Please do not promote UB. Note that it's not implemetation defined, which is sometimes acceptable. It's undefined. It might break at any time and the optimizer can _legally_ assume that you never run into a code path where one is written and the other is read (without writing to that first). Thus, even if the ABI representation is the same, this might break at any given run. – lorro Apr 20 '17 at 19:19
  • 1
    @zwol maybe start your answer with "In C" instead of "In general" , since your union suggestion only applies to C – M.M Apr 20 '17 at 22:35