3

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.

use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;

fn main() {
    let rand_string: String = thread_rng()
        .sample_iter(&Alphanumeric)
        .take(30)
        .collect();

    println!("{}", rand_string);
}

This piece of code does however not compile, (note: I'm on nightly):

error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
 --> src/main.rs:8:10
  |
8 |         .collect();
  |          ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
  |
  = help: the trait `FromIterator<u8>` is not implemented for `String`

Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:

use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;

fn main() {
    let r = thread_rng()
        .sample_iter(&Alphanumeric)
        .take(30)
        .collect::<Vec<_>>();
    let s = String::from_utf8_lossy(&r);
    println!("{}", s);
}

And this compiles and works!

2dCsTqoNUR1f0EzRV60IiuHlaM4TfK

All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.

Questions

  1. .sample_iter(&Alphanumeric) produces u8 and not chars?
  2. How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
  3. The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
pretzelhammer
  • 13,874
  • 15
  • 47
  • 98
dani
  • 3,677
  • 4
  • 26
  • 60

2 Answers2

3

.sample_iter(&Alphanumeric) produces u8 and not chars?

Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:

impl Distribution<char> for Alphanumeric

But then in the docs for 0.8.0:

impl Distribution<u8> for Alphanumeric

How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?

There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:

let s: String = thread_rng()
    .sample_iter(&Alphanumeric)
    .take(30)
    .map(|x| x as char)
    .collect();

Or, using the From<u8> instance of char:

let s: String = thread_rng()
    .sample_iter(&Alphanumeric)
    .take(30)
    .map(char::from)
    .collect();

Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):

let s = unsafe {
    String::from_utf8_unchecked(
        thread_rng()
            .sample_iter(&Alphanumeric)
            .take(30)
            .collect::<Vec<_>>(),
    )
};

If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):

let s = String::from_utf8(
    thread_rng()
        .sample_iter(&Alphanumeric)
        .take(30)
        .collect::<Vec<_>>(),
).unwrap();

The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.

First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)

As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).

Once const generics are finally stabilized, there'll be a much prettier solution.

Aplet123
  • 33,825
  • 1
  • 29
  • 55
  • 1
    I think you should mention a possible reason to prefer `Vec` to `Vec`: `char` is always 4 bytes long, meaning for any sequence of ASCII alphanumerics storing chars will waste three times more memory than is necessary. – Ivan C Dec 28 '20 at 16:04
  • I believe const generics are partially stabilized now. Does a prettier solution exist now? – Frederik Baetens Feb 14 '22 at 14:18
  • 1
    @FrederikBaetens Not yet. If you're interested, you might want to follow/upvote [this Github issue](https://github.com/rust-lang/rust/issues/81615) which contains a relevant discussion. – Aplet123 Feb 14 '22 at 15:03
2

The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:

use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;

fn main() {
    let rand_string: String = thread_rng()
        .sample_iter(&Alphanumeric)
        .map(char::from) // map added here
        .take(30)
        .collect();

    println!("{}", rand_string);
}

playground

pretzelhammer
  • 13,874
  • 15
  • 47
  • 98