1

I have a byte array which contains UTF-8 bytes as &str.

I want to convert this array to UTF-8 chars.

Like this:

["21", "22", "23", "24"]: [&str]

to

!"#$
Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
mikie
  • 59
  • 1
  • 5

2 Answers2

2

I think you're looking for the function std::str::from_utf8. From the docs:

use std::str;

// some bytes, in a stack-allocated array
let sparkle_heart = [240, 159, 146, 150];

// We know these bytes are valid, so just use `unwrap()`.
let sparkle_heart = str::from_utf8(&sparkle_heart).unwrap();

assert_eq!("", sparkle_heart);

For your example, you could change the input literal

fn main() {
    let input = [0x21, 0x22, 0x23, 0x24];
    let result = std::str::from_utf8(&input).unwrap();
    assert_eq!(result, "!\"#$");
}

or keep it stringly-typed and parse it to hex first

fn main() {
    let input = ["21", "22", "23", "24"];
    let parsed:Result<Vec<u8>, std::num::ParseIntError> = input
        .iter()
        .map(|s| u8::from_str_radix(s, 16))
        .collect();
    let parsed = parsed.expect("Couldn't parse input as hexadecimal");
    let result = String::from_utf8(parsed).expect("Input contains invalid UTF-8");
    assert_eq!(result, "!\"#$");
}
asky
  • 1,520
  • 12
  • 20
1
use std::convert::TryFrom;

fn main() {
    let input = ["21", "22", "23", "24"];

    let result: String = input
        .iter()
        .map(|s| u32::from_str_radix(s, 16).unwrap()) // HEX string to unsigned int
        .map(|u| char::try_from(u).unwrap()) // unsigned int to char (unicode verification)
        .collect();

    assert_eq!(result, "!\"#$");
}

Then you can add the error verification if you want instead of unwrap. I would do that with an iterator:

use std::convert::TryFrom;
use std::error::Error;

struct HexSliceToChars<'a> {
    slice: &'a [&'a str],
    index: usize,
}

impl<'a> HexSliceToChars<'a> {
    fn new(slice: &'a [&'a str]) -> Self {
        HexSliceToChars {slice, index: 0 }
    }
}

impl<'a> Iterator for HexSliceToChars<'a> {
    type Item = Result<char, Box<dyn Error>>;

    fn next(&mut self) -> Option<Self::Item> {
        self.slice.get(self.index).map(|s| {
            let u = u32::from_str_radix(s, 16)?;
            let c = char::try_from(u)?;

            self.index += 1;

            Ok(c)
        })
    }
}

fn main() {
    let input = ["21", "22", "23", "24"];
    let result: Result<String, _> = HexSliceToChars::new(&input).collect();
    // Error handling
    let result = result.unwrap();

    assert_eq!(result, "!\"#$");
}
Boiethios
  • 38,438
  • 19
  • 134
  • 183
  • 1
    This code fails for any input that is not ascii. For example is `["f0", "9f", "98", "80"]` in hex, and this code produces "ð\u{9f}\u{98}\u{80}". For that reason, it should not be the accepted answer. – asky Jun 05 '20 at 09:28
  • @asky Nope, that's `1F600 ` (https://emojipedia.org/emoji/%F0%9F%98%80/): https://play.integer32.com/?version=stable&mode=debug&edition=2018&gist=40f359de00d66e123691c04578575712 – Boiethios Jun 05 '20 at 09:52
  • `1F600` is not UTF-8. If you want to process non-ASCII **UTF-8** bytes, the easiest way is to go through a `Vec`: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=3766ce8a4d423d34ded5877153a1fda1). – Jmb Jun 05 '20 at 10:08
  • `U+1F600` is the unicode code point, different from the UTF-8 encoding of that code point. See https://stackoverflow.com/a/27939161/6677437 for how to convert between the two. – asky Jun 05 '20 at 20:40