0

I'm trying to find the Rust equivalent of having a ASCII string buffer on the stack to have the same efficiency as plain C code has.

Here an example on what I mean with a simplified toy exercise: the goal is to generate a random-content and random-length ASCII string that is at most 50 characters long. Thus I keep a char buffer on the stack that is used to iteratively construct the string. Once finished, the string is copied onto the heap with the just-right malloc size and returned to the user.

#include <stdint.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include <stdio.h>

#define ASCII_PRINTABLE_FIRST ' '
#define ASCII_PRINTABLE_AMOUNT 95
#define MAX_LEN 50
#define MAX_LEN_WITH_TERM (MAX_LEN + 1)

char* generate_string(void) {
    char buffer[MAX_LEN_WITH_TERM];
    srand((unsigned) time(NULL));
    // Generate random string length
    const int len = rand() % MAX_LEN_WITH_TERM;
    int i;
    for (i = 0; i < len; i++) {
        // Fill with random ASCII printable character
        buffer[i] = (char)
            ((rand() % ASCII_PRINTABLE_AMOUNT) + ASCII_PRINTABLE_FIRST);
    }
    buffer[i] = '\0';
    return strdup(buffer);
}

int main(void) {
    printf("Generated string: %s\n", generate_string());
    return 0;
}

What I explored so far:

  • Using a buffer String::with_capacity(50) or BytesMut, but that allocates the buffer on the heap, which I would like to avoid. Sure, it's premature optimisation, but as an optimisation exercise let's image me calling generate_string() a billion times. That is a billion malloc calls to allocate the buffer. I don't want to use static memory.
  • Using a an array of chars on the stack, but it consumes 4x the space for just ASCII characters

What are your suggestions?

EDIT:

  1. Yes, it leaks memory. That't not the point of my question, unless you want much longer snippets of code.
  2. Yes, it has insecure random characters. That's not the point of my question.
  3. Why would I allocate the buffer on the heap once per generate_string() call? To make the function self contained, stateless and without static memory. It does not require a pre-allocated buffer externally.
pretzelhammer
  • 13,874
  • 15
  • 47
  • 98
Matjaž
  • 292
  • 6
  • 14
  • 1
    (side note: `return strdup(buffer);` - your C code is leaking memory.) – KamilCuk Dec 14 '20 at 18:40
  • It looks like your question might be answered by the answers of [How do I collect into an array?](https://stackoverflow.com/q/26757355/155423); [Is it possible to have stack allocated arrays with the size determined at runtime in Rust?](https://stackoverflow.com/q/27859822/155423); [How to set a Rust array length dynamically?](https://stackoverflow.com/q/34684261/155423). If not, please **[edit]** your question to explain the differences. Otherwise, we can mark this question as already answered. – Shepmaster Dec 14 '20 at 18:40
  • 3
    What's wrong with using `[u8; 50]`? – Shepmaster Dec 14 '20 at 18:42
  • 2
    *if I would call `generate_string()` a billion times, that is a billion extra heap allocations.* This is generally false; why would you allocate twice? Allocate once and return it. Why doesn't your C code allocate once and directly write into the buffer instead of writing it once and then copying if you are this concerned about performance? – Shepmaster Dec 14 '20 at 18:43
  • [How do I create a random String by sampling from alphanumeric characters?](https://stackoverflow.com/a/54277357/155423) – Shepmaster Dec 14 '20 at 18:44
  • [The duplicate applied](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=680684b7e6c4cdd9185d85956f3693b8) – Shepmaster Dec 14 '20 at 18:49
  • [How do I convert a Vector of bytes (u8) to a string](https://stackoverflow.com/q/19076719/155423) – Shepmaster Dec 14 '20 at 18:57
  • 1
    Please don't use `random % something` — [Why do people say there is modulo bias when using a random number generator?](https://stackoverflow.com/q/10984974/155423) – Shepmaster Dec 14 '20 at 19:03
  • Thanks for the links to the other posts, but they don't answer my scenario, which is the correct and efficient way of using an ASCII buffer on the stack. The others use Vec or talk about randomness or conversions, which are not the point of my question. – Matjaž Dec 14 '20 at 21:44

2 Answers2

3

The Rust type that is equivalent to C's char is u8, so the equivalent to a char buffer on the stack is an u8 array.

let mut buf = [0u8; 20];

for i in 0..20 {
    buf[i] = b'a' + i as u8;
}

To obtain a &str slice that points into the stack buffer, you can use std::str::from_utf8, which performs a UTF-8 check and returns the pointer if it is valid UTF-8.

fn takes_a_string(a: &str) {
    println!("{}", a);
}

fn main() {
    let mut buf = [0u8; 20];
    
    for i in 0..20 {
        buf[i] = b'a' + i as u8;
    }
    
    // This calls takes_a_string with a reference to the stack buffer.
    takes_a_string(std::str::from_utf8(&buf).unwrap());
}
abcdefghijklmnopqrst
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Alice Ryhl
  • 3,574
  • 1
  • 18
  • 37
3

You can generate a random length u8 array (stored on the stack) and only allocate memory on the heap when you convert it to a String using the from_utf8 method. Example:

use rand::prelude::*;

const MAX_LEN: usize = 50;
const ASCII_START: u8 = 32;
const ASCII_END: u8 = 127;

fn generate_string() -> String {
    let mut buffer = [0; MAX_LEN];
    let mut rng = rand::thread_rng();
    let buffer_len = rng.gen_range(0, MAX_LEN);
    for i in 0..buffer_len {
        buffer[i] = rng.gen_range(ASCII_START, ASCII_END);
    }
    String::from_utf8((&buffer[0..buffer_len]).to_vec()).unwrap()
}

fn main() {
    for _ in 0..5 {
       dbg!(generate_string()); 
    }
}

playground

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
pretzelhammer
  • 13,874
  • 15
  • 47
  • 98