How do you approach creating a complete new datatype on the "bit-level"?

Question

I would like to create a new data type in Rust on the "bit-level".

For example, a quadruple-precision float. I could create a structure that has two double-precision floats and arbitrarily increase the precision by splitting the quad into two doubles, but I don't want to do that (that's what I mean by on the "bit-level").

I thought about using a u8-array or a bool-array but in both cases, I waste 7 bits of memory (because also bool is a byte large). I know there are several crates that implement something like bit-arrays or bit-vectors, but looking through their source code didn't help me to understand their implementation.

How would I create such a bit-array without wasting memory, and is this the way I would want to choose when implementing something like a quad-precision type?

I don't know how to implement new data types that don't use the basic types or are structures that combine the basic types, and I haven't been able to find a solution on the internet yet; maybe I'm not searching with the right keywords.

What do you mean “in both cases, I waste 7 bits of memory”? Use each bit of each u8 and you won’t. — Ry-, Jan 06 '20 at 22:53
Hi there and welcome to StackOverflow! So just to be clear, the part about interacting with Python is just backstory and the question is not about Python at all? The new quadruple-precision float type you want to create does not have to be accessed from Python, yes? — Lukas Kalbertodt, Jan 06 '20 at 22:53
[This Q&A](https://stackoverflow.com/q/40467995/2408867) might help you? An `u8` array does not waste any bits, but it seems you are not sure how to "access" those bits? — Lukas Kalbertodt, Jan 06 '20 at 22:56
@LukasKalbertodt Yes the thing with Python was just backstory and it doesn't need to be accessed. Yes you are right, to be more specific it would waste bits if I only wrote ones and zeros in the u8's — MEisebitt, Jan 07 '20 at 00:34
To be clear, your byte array would be `[u8; 16]`, not `[u8; 128]`. As Ry- says; you use *all* of the bits. — Shepmaster, Jan 07 '20 at 03:14

score 2 · Accepted Answer · edited Jan 07 '20 at 02:30

The question you are asking has no direct answer: Just like any other programming language, Rust has a basic set of rules for type layouts. This is due to the fact that (most) real-world CPUs can't address individual bits, need certain alignments when referencing memory, have rules regarding how pointer arithmetic works etc. etc.

For instance, if you create a type of just two bits, you'll still need an 8-bit byte to represent that type, because there is simply no way to address two individual bits on most CPU's opcodes; there is also no way to take the address of such a type because addressing works at least on the byte-level. More useful information regarding this can be found here, section 2, The Anatomy of a Type. Be aware that the non-wasting bit-level type you are thinking about needs to fulfill all the rules mentioned there.

It's a perfectly reasonable approach to represent what you want to do e.g. either as a single, wrapped u128 and implement all arithmetic on top of that type. Another, more generic, approach would be to use a Vec<u8>. You'll always do a relatively large amount of bit-masking, indirecting and such.

Having a look at rust_decimal or similar crates might also be a good idea.

So if I implement a quad on an u128 wouldn't e.g. the multiplication be relatively slow, because in this case, I can't just apply the standard floating-point arithmetic because that works directly with the single bit's ? — MEisebitt, Jan 07 '20 at 00:41
@MEisebitt Well, "standard floating-point arithmetic" is typically implemented by physical hardware in your CPU, so unless you have a 128 bit FPU (and can convince LLVM to use it), yeah it's going to be a lot slower. You can't get hardware-like speeds in software alone. — trent, Jan 07 '20 at 17:55

How do you approach creating a complete new datatype on the "bit-level"?

1 Answers1