2

I'm writing an encoding library and I'd like to convert a slice into a usize.

I see a read_uint method that looks promising, though I'm unsure how to get the register size as a variable so I can put it in the function.

For example I'd like to get 32 on a 32 bit processor, and 64 on a 64 bit processor.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
A F
  • 7,424
  • 8
  • 40
  • 52
  • 1
    Possibly relevant: [How to get the size of a user defined struct? (sizeof)](https://stackoverflow.com/questions/36664327/how-to-get-the-size-of-a-user-defined-struct-sizeof) and [What's the alternative to u32::BITS in a const?](https://stackoverflow.com/questions/36285396/whats-the-alternative-to-u32bits-in-a-const) – trent Nov 05 '18 at 20:18
  • 1
    Note that use usize in a data format is a bad idea. Prefer something like u64 with bound checking. (That also why this function doesn't exist in byteorder) – Stargateur Nov 06 '18 at 05:49

2 Answers2

4

TL;DR there is a good reason for not providing a read_usize function, because it is not consistent on different cpu-architectures.


This is a bad idea. Normally you have some kind of protocol you are trying to deserialize. This format should be independent from the cpu-architecture and therefore you can't read an usize, because it is cpu-dependent.

Let's assume you have a simple protocol where you first have the size of an array and afterwards n elements.

+------+---------+
| size | ....... |
+------+---------+

Let's suppose the protocol says that your size is 4 byte long. Now you want to do the thing Shepmaster suggested and read the usize dependent on your architecture.

On a x86_64 OS you will now read 8 bytes and therefore swallow the first element in your array.
On a Atmega8 your usize would be 2 bytes and therefore only take the first 2 bytes of your size (which might be zero in case there are less than 65k elements and a BigEndian byte-order).

This is the reason why there is no read_usize function and it is correct. You need to decide how long your size is, read the exact amount of bytes from your slice and then us as to convert that into an usize.

hellow
  • 12,430
  • 7
  • 56
  • 79
  • 2
    I agree, but the OP specifically stated that they understand that they will get different results on different platforms. This is a design decision that is acceptable to make, so long as you understand the tradeoffs. – Shepmaster Nov 06 '18 at 14:44
  • 3
    For example, "the protocol" might be a dump of RAM that is never supposed to cross architectures. – Shepmaster Nov 06 '18 at 14:45
  • In that case that it is helpful, yes. – hellow Nov 06 '18 at 14:49
  • 1
    @Shepmaster In this case, you simply don't need ByteOrder. your platform will not change of endianess ;) – Stargateur Nov 09 '18 at 11:21
  • 1
    For the record, `byteorder` extends `Read` and `Write` with primitive reading and writing functions (e.g. `read_u32`) and as such, is not completely useless when only working under the system's native endianness. – E_net4 Nov 14 '18 at 15:44
3

One way is to use mem::size_of to get the size of a usize:

use byteorder::{ByteOrder, ReadBytesExt};

fn read_usize<B, R>(mut b: R) -> Result<usize, std::io::Error>
where
    B: ByteOrder,
    R: ReadBytesExt,
{
    b.read_uint::<B>(std::mem::size_of::<usize>()).map(|v| v as usize)
}

Another is to have different functions or function implementations for different architectures:

fn read_usize<B, R>(mut b: R) -> Result<usize, std::io::Error>
where
    B: ByteOrder,
    R: ReadBytesExt,
{
    if cfg!(target_pointer_width = "64") {
        b.read_u64::<B>().map(|v| v as usize)
    } else if cfg!(target_pointer_width = "32") {
        b.read_u32::<B>().map(|v| v as usize)
    } else {
        b.read_u16::<B>().map(|v| v as usize)
    }
}

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • While this answer is technically correct (like all (?) of your answers) I would not suggest to do something like this and point out, that this is normally not a wanted behavior. – hellow Nov 06 '18 at 07:31