0

I am writing a virtual machine that has 8 bit opcodes, and one of the common types for the instructions has the 8-bit opcode, followed in memory by a 56 bit signed integer operand.

Originally, I was going to implement this instruction as follows:

struct machine_op {
   std::uint64_t opcode:8
   std::int64_t operand:56;
};

However, Ive heard no end of comments that using bitfields like this would not be portable, and in particular, it would not guarantee that the operand field would actually be in the right place in memory (ie, the first byte of memory of the struct).

What I then thought of doing was to create a 56-bit integer class, and modify the above structure to the following:

struct machine_op {
   char opcode;
   int56 operand;
};

This would, by virtue of being a standard layout structure, guarantee that the address of the operand is immediately after the opcode, as long as int56 was written in a way that did not have any alignment restrictions.

And to that end, I have the following class defined, whose purpose is to encapsulate a signed 56 bit integer that has no alignment restrictions:

class int56 {
public:
    struct data {
        unsigned char byte1;
        unsigned char byte2;
        unsigned char byte3;
        unsigned char byte4;
        unsigned char byte5;
        unsigned char byte6;
        unsigned char byte7;
    };

    struct big_endian {
        char sign;
        data value;
        inline big_endian() = default;

        inline constexpr big_endian(const data &v) : sign(v.byte1 & 0x80 ? 0xff : 0), value(v)
        {
        }
    };

    struct little_endian {
        data value;
        char sign;
        inline little_endian() = default;

        inline constexpr little_endian(const data &v) : value(v), sign(v.byte7 & 0x80 ? 0xff : 0)
        {
        }
    };

    union aligner {
        std::int64_t value;
        big_endian big;
        little_endian little;
    };
    inline int56() = default;

    inline constexpr int56(std::int64_t x) :
#if BYTE_ORDER == LITTLE_ENDIAN
    value((aligner{x}).little.value)
#else
    value((aligner{x}).big.value)
#endif
    {
    }

    inline constexpr operator std::int64_t() const 
    {
#if BYTE_ORDER == LITTLE_ENDIAN
        return (aligner{.little = little_endian(value)}).value;
#else
        return (aligner{.big = big_endian(value)}).value;
#endif
    }

private:
    data value;
};

This implementation would only work for either big endian or little endian machines, however... other bit representations of integers would fail. Notwithstanding, I can live with that limitation, but even when using high optimization compiler flags, I find that accessing this class has a significant performance penalty over using bitfields.

I am, by the way, always going to be targetting machines with at least a 64-bit architecture, so there is no problem accessing data that is 64 bits wide. I just cannot resort to hand assembly because in addition to making those functions no longer constexpr, I would have to write assembly for each supported cpu architecture, and I am only particularly fluent in one of them.

Is this therefore reasonably the best I can do, or is there a better way to set up this int56 class?

Or, is there some way that I can be sure to be able to use bitfields to do what I actually need?

Thanks in advance

markt1964
  • 2,638
  • 2
  • 22
  • 54
  • Initialization of one member of a union immediately followed by reading from another member is officially undefined, you know? – bipll Jun 10 '18 at 05:50
  • It seems to be undefined only in the respect that the bit representation of different structures or union members is not defined by the language, so I'm aware that this code wouldn't work for middle endian machines, for example. – markt1964 Jun 10 '18 at 05:59
  • 1
    There's no such thing as 'undefined only in the respect' >_< though this particular one is probably the most defined UB ever. – bipll Jun 10 '18 at 06:01
  • [Which C datatype can represent a 40-bit binary number?](https://stackoverflow.com/q/9595225/995714), [If a 32-bit integer overflows, can we use a 40-bit structure instead of a 64-bit long one?](https://stackoverflow.com/q/27705409/995714) – phuclv Jun 11 '18 at 01:42
  • anyway in your case just use a `uint64_t` and get the 2 fields with bitwise operators – phuclv Jun 12 '18 at 14:27

0 Answers0