0

I am trying to test a program which checks for correct use of brackets. To do this I want to generate Long strings of numbers with correct use of brackets (to test incorrect uses I can easily just swap around individual characters or groups of characters)

I can easily generate a vector of non-bracket characters. However, To place brackets correctly I need n unique indices. I do not know of a way to efficiently generate random indices (a list of unique random numbers in a range) so I would like either a way to do that or a way of converting random numbers into unique indices.

The two best approaches I can think of for this are:

  1. generate a vector of all indices 0-N , shuffle it and then take the first n
  2. generate a random number, test if it is in a hashset, if isn't add it and if it is regenerate until it isn't. Repeat until the hashset has n members then convert it to a vector

The first approach is efficient for N = n + $\epsilon$ but very inefficient for N >> n while the second approach is very inefficient for N = n + $\epsilon$ but efficient for N >> n.

Note:

  1. n is the number of brackets N is the number of characters
  2. for those unfamiliar with the notation $\epsilon$ is some small positive number
Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
Pioneer_11
  • 670
  • 4
  • 19
  • 1
    By "correct use of brackets" do you mean that opening/closing pairs must match? Does that even matter for this question? That being said you probably want this https://docs.rs/rand/latest/rand/seq/index.html – Jared Smith May 16 '23 at 12:30
  • 1
    Check the [sampling](https://rust-random.github.io/book/guide-seq.html#sampling) section as well as the source code of the [`choose_multiple`](https://rust-random.github.io/rand/rand/seq/trait.IteratorRandom.html#method.choose_multiple) method, as it may take `O(n)` steps, not `O(N)`. – xamgore May 16 '23 at 12:34
  • 1
    I think you want [`rand::seq::index::sample()`](https://docs.rs/rand/latest/rand/seq/index/fn.sample.html) – Sven Marnach May 16 '23 at 13:15
  • @SvenMarnach Is rust’s sample method with or without replacement? The linked document didn’t say. – pjs May 16 '23 at 13:57
  • @pjs The first sentence of the linked page is this "Randomly sample exactly `amount` distinct indices from `0..length`". This means without replacement, I guess, since the indices will be different. – Sven Marnach May 16 '23 at 15:27
  • What is your N? Ballpark figure? – Severin Pappadeux May 16 '23 at 16:46
  • @SeverinPappadeux I'm wanting to test a bunch of different cases so from a couple hundred to millions – Pioneer_11 May 16 '23 at 17:29
  • @SvenMarnach Thanks, I hadn’t spotted that. That looks like the simplest answer then. – pjs May 16 '23 at 20:10

2 Answers2

2

Well you already answered your own question :^)

I would go with No 1. While epsilon analysis is nice, in the real world, there's a lot more going on. Random accesses on vectors are very, very fast, while the hashing and regenerating random numbers is probably slow (though as always do benchmarks).

However if you're not generating a LOT of brackets, the performance difference is probably negligible.

For your first solution you can use the rand crate and its SliceRandom trait. Do something like:

use rand::seq::SliceRandom;

fn whatever() {
 let mut numbers: Vec<usize> = (0..N).iter().copied().collect();
 numbers.shuffle(&mut rand::thread_rng());
 let usable_numbers = numbers[0..n];
 // Tada
}

The other solution

fn whatever() {
  let mut out_map = HashSet::new();
  while out_map.len() < n {
   out_map.insert(rand::thread_rng().gen::<usize>() % N);
  }

 let numbers = out_map.into_iter().collect::<Vec<_>>();

}
IcyTv
  • 405
  • 4
  • 12
  • Makes sense. I knew the solutions would work already but I posted the question as I thought there would be some better solution I'm missing. The only modification I would make is to combine the two solutions so it uses approach `1` if `n > kN` and approach `2` otherwise (setting the value of k depending on the relative speed of the algorithms). – Pioneer_11 May 16 '23 at 12:51
  • 1
    @Pioneer_11 That seems like added complexity for very little benefit to me... Maybe try and benchmark the solutions instead and see what performance difference it actually makes. While O analysis is nice, it often does not reflect the reality until you get into really big numbers. – IcyTv May 16 '23 at 12:55
  • @Pioneer_11 We know from the [birthday problem](https://en.wikipedia.org/wiki/Birthday_problem) that solution 2 starts getting expensive for n > sqrt(N) due to increasing probability of collisions. Given that you’re already allocating N resources to construct a scenario withN characters, sticking with the shuffling approach seems to me like the better approach. – pjs May 16 '23 at 13:09
2

If you really want to generate n distinct random number, then this is a duplicate of How to choose several random numbers from an interval?.

However in your case, instead of first generating a sequence of non-bracket characters and then inserting brackets, it might be more efficient to generate the sequence in a single pass with something like this:

fn make_string (len: usize, mut n_brackets: usize) -> String {
    assert!(len >= 2*n_brackets);
    let mut result = String::with_capacity (len);
    let mut n_open = 0;
    let mut rng = rand::thread_rng();
    for i in 0..len {
        let sel = rng.gen_range (0 .. (len-i));
        if sel < n_open {
            result.push (')');
            n_open -= 1;
        } else if sel < n_open + 2*n_brackets {
            result.push ('(');
            n_open += 1;
            n_brackets -= 1;
        } else {
            result.push (rng.gen_range ('a' .. 'z'));
        }
    }
    result
}

Playground

Jmb
  • 18,893
  • 2
  • 28
  • 55
  • That probably makes more sense. I thought there may be simple and efficient method that I hadn't seen but that doesn't seem to be the case. As this is a test situation I'm not overly concerned with efficiency (I just don't want something hysterically inefficient) but as that doesn't seem to exist this is probably the best solution. – Pioneer_11 May 16 '23 at 15:07
  • @Pioneer_11 Given that this solution is `O(N)` to generate a sequence of length `N` whatever the number of bracket pairs, you won't be able to be more efficient. – Jmb May 17 '23 at 06:47
  • I was referring to a more simple and efficient method for generating distinct random numbers. – Pioneer_11 May 17 '23 at 16:47