4

I have a project in which I am getting a vector of 32-bit ARM instructions, and a part of the instructions (offset values) needs to be read as signed (two's complement) numbers instead of unsigned numbers.

I used a uint32_t vector because all the opcodes and registers are read as unsigned and the whole instruction was 32-bits.

For example:

I have this 32-bit ARM instruction encoding:

uint32_t addr = 0b00110001010111111111111111110110

The last 19 bits are the offset of the branch that I need to read as signed integer branch displacement. This part: 1111111111111110110


I have this function in which the parameter is the whole 32-bit instruction: I am shifting left 13 places and then right 13 places again to have only the offset value and move the other part of the instruction.

I have tried this function casting to different signed variables, using different ways of casting and using other c++ functions, but it prints the number as it was unsigned.

int getCat1BrOff(uint32_t inst)
{
    uint32_t temp = inst << 13;
    uint32_t brOff = temp >> 13;
    return (int)brOff;
}

I get decimal number 524278 instead of -10.

The last option that I think is not the best one, but it may work is to set all the binary values in a string. Invert the bits and add 1 to convert them and then convert back the new binary number into decimal. As I would of do it in a paper, but it is not a good solution.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 2
    `temp` and `brOff` should be signed. Can't be bothered to check if that's standard compliant but `gcc -ansi -pedantic` accepts it :) – Jester Nov 17 '19 at 19:06
  • And to underscore the point: the `u` part of `uint32_t` means unsigned. That's what it means. Unsigned means: when you shift right, the high order bit gets filled with `0`. That's a fundamental property of unsigned integer types. That's obviously not what you want, so you obviously can't use an unsigned type for this. – Sam Varshavchik Nov 17 '19 at 19:09
  • 1
    I need to use an unsigned type because it is really an instruction with an opcode, registers and other important parts that must be read as unsigned. I shifted to get rid (make 0s) the bits that I am not converting to signed and only get the offset value of the instruction. I could of AND it with 0b00000000000001111111111111111111 and works as well. – Moises Rodan Nov 17 '19 at 19:14
  • You can use an unsigned type wherever you need it. But, here, you need to use a signed type, for the reasons stated. There is no prohibition about a temporary conversion to a signed type, for the purposes of a particular operation, and then converting it back. – Sam Varshavchik Nov 17 '19 at 19:15
  • I got it. If I shift the signed value will fill the bits with 1 instead of 0. Let me check – Moises Rodan Nov 17 '19 at 19:20
  • @SamVarshavchik Thank you!! I got it thanks to you. I needed someone to refresh properties. – Moises Rodan Nov 17 '19 at 19:23
  • C's binary and other operators are not typed. There is no distinction of the operator itself for adding, multiplying, dividing, shifting, etc.. The only way to tell C whether an operator should perform signed, unsigned, float or double is to cast an(the) operand(s) a desired type prior to using a (binary) operator. You can cast it one way, then cast it back as needed. – Erik Eidt Nov 17 '19 at 20:25
  • 2
    Please don't include the answer in your question. If you have a solution, you can post an answer. – Keith Thompson Nov 17 '19 at 20:35
  • 1
    `brOff << 13` is potentially UB in portable C++ because you could be overflowing a signed-integer left shift. Don't cast to `int32_t` until *after* left shifting. And don't post an answer in the question. – Peter Cordes Nov 17 '19 at 21:25

2 Answers2

2

It boils down to doing a sign extension where the sign bit is the 19th one. There are two ways.

  1. Use arithmetic shifts.
  2. Detect sign bit and or with ones at high bits.

There is no portable way to do 1. in C++. But it can be checked on compilation time. Please correct me if the code below is UB, but I believe it is only implementation defined - for which we check at compile time. The only questionable thing is conversion of unsigned to signed which overflows, and the right shift, but that should be implementation defined.

int getCat1BrOff(uint32_t inst)
{
    if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
    {
        return int32_t(inst << uint32_t{13}) >> int32_t{13};
    }
    else
    {
        int32_t offset = inst & 0x0007FFFF;
        if (offset & 0x00040000)
        {
            offset |= 0xFFF80000;
        }
        return offset;
    }
}

or a more generic solution

template <uint32_t N>
int32_t signExtend(uint32_t value)
{
    static_assert(N > 0 && N <= 32);
    constexpr uint32_t unusedBits = (uint32_t(32) - N);
    if constexpr (int32_t(0xFFFFFFFFu) >> 1 == int32_t(0xFFFFFFFFu))
    {
        return int32_t(value << unusedBits) >> int32_t(unusedBits);
    }
    else
    {
        constexpr uint32_t mask = uint32_t(0xFFFFFFFFu) >> unusedBits;
        value &= mask;
        if (value & (uint32_t(1) << (N-1)))
        {
            value |= ~mask;
        }
        return int32_t(value);
    }
}

https://godbolt.org/z/rb-rRB

Sopel
  • 1,179
  • 1
  • 10
  • 15
1

In practice, you just need to declare temp as signed:

int getCat1BrOff(uint32_t inst)
{
    int32_t temp = inst << 13;
    return temp >> 13;
}

Unfortunately this is not portable:

For negative a, the value of a >> b is implementation-defined (in most implementations, this performs arithmetic right shift, so that the result remains negative).

But I have yet to meet a compiler that doesn't do the obvious thing here.

TonyK
  • 16,761
  • 4
  • 37
  • 72
  • C and even more, C++, are my two home languages. But sometimes, I just would like the standards to get out of the way and allow me to be explicit (e.g specifying type of shift in the generated asm), something akin to portable assembler. If some weird, esoteric arch has weird behavior and do not support e.g both logical and arithmetic shifts, the compiler could just give an error for that arch. For now, coding explicit assembly, one ARM snippet and one X86, may cover 97% of the market ;) – Erik Alapää Nov 23 '19 at 15:17