20

I'm reading a binary file into a Rust program using a Vec<u8> as a buffer. Two bytes in the stream represent a big-endian u16.

So far, the only way I've figured out how to convert to a primitive u16 involves converting the two elements to Strings first, and it looks terrible.

Code:

let vector: Vec<u8> = [1, 16].to_vec();
let vector0: String = format!("{:02x}", vector[0]);
let vector1: String = format!("{:02x}", vector[1]);
let mut vector_combined = String::new();
vector_combined = vector_combined + &vector0.clone();
vector_combined = vector_combined + &vector1.clone();
let number: u16 = u16::from_str_radix(&vector_combined.to_string(), 16).unwrap();

println!("vector[0]: 0x{:02x}", vector[0]);
println!("vector[1]: 0x{:02x}", vector[1]);
println!("number: 0x{:04x}", number);

Output:

vector[0]: 0x01
vector[1]: 0x10
number: 0x0110
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Hydraxan14
  • 618
  • 2
  • 6
  • 18

2 Answers2

53

If you actually had two distinct u8s, the conventional solution involves bitwise manipulation, specifically shifting and bitwise OR. This requires zero heap allocation and is very efficient:

let number = ((vector[0] as u16) << 8) | vector[1] as u16;

And a graphical explanation:

             A0                   B0
        +--------+           +--------+
        |XXXXXXXX|           |YYYYYYYY|
        +-------++           +-------++
                |                    |
 A1 = A0 as u16 |     B1 = B0 as u16 |
+---------------v+   +---------------v+
|00000000XXXXXXXX|   |00000000YYYYYYYY|
+---------------++   +---------------++
                |                    |
   A2 = A1 << 8 |                    |
+---------------v+                   |
|XXXXXXXX00000000|                   |
+---------------++                   |
                |              +--+  |
                +-------------->OR<--+
                               +-++
                                 |
                     V = A2 | B1 |
                 +----------+----v+
                 |XXXXXXXXYYYYYYYY|
                 +----------------+

However, you are really looking at your problem too narrowly. You don't have two u8, you have a &[u8].

In this case, use the byteorder crate:

use byteorder::{ByteOrder, LittleEndian}; // 1.3.4

fn main() {
    let data = [1, 16];
    let v = LittleEndian::read_u16(&data);
    println!("{}", v);
}

This shows its power when you want to handle reading through the buffer:

use byteorder::{BigEndian, LittleEndian, ReadBytesExt}; // 1.3.4

fn main() {
    let data = [1, 16, 1, 2];
    let mut current = &data[..];

    let v1 = current.read_u16::<LittleEndian>();
    let v2 = current.read_u16::<BigEndian>();

    println!("{:?}, {:?}", v1, v2); // Ok(4097), Ok(258)
}

As you can see, you need to be conscious of the endianness of your input data.

You could also get a fixed-size array from your slice and then use u16::from_le_bytes. If you had a &[u8] and wanted to get a Vec<u16>, you can iterate over appropriately-sized slices using chunks_exact (or array_chunks).

See also:


Free code review on your original post:

  • There's no need to use to_vec here, use vec! instead.

  • There's no need to specify the vast majority of the types.

    let vector = [1u8, 16].to_vec();
    
    let vector0 = format!("{:02x}", vector[0]);
    let vector1 = format!("{:02x}", vector[1]);
    let mut vector_combined = String::new();
    vector_combined = vector_combined + &vector0.clone();
    vector_combined = vector_combined + &vector1.clone();
    let number = u16::from_str_radix(&vector_combined.to_string(), 16).unwrap();
    
  • There's no need to clone the strings before taking a reference to them when adding.

  • There's no need to convert the String to... another String in from_str_radix.

    let vector0 = format!("{:02x}", vector[0]);
    let vector1 = format!("{:02x}", vector[1]);
    let mut vector_combined = String::new();
    vector_combined = vector_combined + &vector0;
    vector_combined = vector_combined + &vector1;
    let number = u16::from_str_radix(&vector_combined, 16).unwrap();
    
  • There's no need to create an empty String to append to, just use vector0

    let vector0 = format!("{:02x}", vector[0]);
    let vector1 = format!("{:02x}", vector[1]);
    let vector_combined = vector0 + &vector1;
    let number = u16::from_str_radix(&vector_combined, 16).unwrap();
    
  • There's no need to create two strings at all, one will do:

    let vector_combined = format!("{:02x}{:02x}", vector[0], vector[1]);
    let number = u16::from_str_radix(&vector_combined, 16).unwrap();
    

Of course, this still isn't the right solution, but it's better.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    You may want to update this answer to use the `u16::from_le_bytes()` or `u16::from_be_bytes()` functions which exist now. See https://doc.rust-lang.org/std/primitive.u16.html#method.from_be_bytes – youR.Fate May 11 '20 at 15:44
  • 2
    @youR.Fate mentioned, but it's not a great solution for the OP, as they don't have an array to start with. That means that they'd have to handle the error case, then perform the transformation. byteorder does that in one step. – Shepmaster May 11 '20 at 15:56
  • when I converted a `Vec` to a `Vec` I used a chunk iter over the `u8` vec, then mapped that itterator to the `u16::from_le_bytes` function to get a vector of `u16`. But I can see why byteorder might be a neater soltuion. – youR.Fate May 12 '20 at 12:55
  • 1
    @youR.Fate mentioned, but the chunk iterators return slices, not fixed-size arrays, so you still have to deal with an error case. When [const generics are stable](https://stackoverflow.com/q/28136739/155423), we might be able to get a chunks iterator that returns `&[T; N]` and avoid the error case completely. – Shepmaster May 12 '20 at 13:05
7

You can multiply the first element to move it to the higher byte, then add the second element. It just needs extra casting:

let a: u8 = 1;
let b: u8 = 2;
let c: u16 = (a as u16 * 256) + b as u16;
println!("c: {}", c); // c: 258
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
viraptor
  • 33,322
  • 10
  • 107
  • 191
  • 4
    I don't know a *ton* about binary manipulation, but wouldn't a left shift be more appropriate than just a multiplication? – MutantOctopus May 09 '18 at 01:47
  • @BHustus If the compiler can't optimize that you can blame it ;) – Stargateur May 09 '18 at 02:36
  • 8
    I mean, not necessarily for optimization purposes, but for clarity. Bit-shifting seems like it would more clearly explain what you're doing: "I want to shift 8 bits to the left, then place the other number in the empty right half to create one number". Multiplication does the same thing, but it seems like another step to parse for people reading the code. – MutantOctopus May 09 '18 at 04:42