0

I have a list of strings (["A", "B", ...]) and a list of sizes ([4,7,...]). I would like to sample without replacement from the set of strings where initially string in position i appears sizes[i] times. I have to perform this operation k times. Clearly, if I pick string i, then sizes[i] decreases by 1. The current naive solution that I developed is to generate the entire input set, shuffle it, and iteratively pop the first element of the array. This is clearly inefficient since if a string appears 1 million times I would have to generate 1 million entries.

public static void main(String[] args) {
    String[] elems = { "A", "B", "C", "D", "E" };
    Integer[] sizes = { 10, 5, 4, 7, 3 };
    int k = 3;

    ArrayList<String> bag = new ArrayList<>();
    for (int i = 0; i < elems.length; i++) {
        for (int j = 0; j < sizes[i]; j++) {
            bag.add(elems[i]);
        }
    }

    Collections.shuffle(bag);
    for (int i = 0; i < k; i++) {
        System.out.println(bag.remove(0));
    }
}

Is there a better and more efficient way to perform this operation? Thanks.

molfo
  • 75
  • 7
  • Why are you removing it then? Since it is an ArrayList, you could access first `k` elements by their index (`bag.get(i)`). What is the expectation when the next `k` elements are needed? Should the original bag be restored? – Thiyagu Dec 11 '20 at 08:59
  • You are right. I could just access the first `k` elements. In the end, the result is anyway the same I think – molfo Dec 11 '20 at 09:07

4 Answers4

2

Assuming the bag doesn't have to be persistent or be used at all you could create a class that contains input and frequency, e.g. like this (simplified):

class SampleElement<T> {
  private T value;
  private int frequency;

  //constructors, getters, setters
}

Then build a collection of those elements from the input you have, e.g. (again simplified):

 List<SampleElement<String>> samples = Arrays.asList(new SampleElement<String>("A",10), ...);

Finally loop until that collection is empty or you've done it k times and pick a random element. Decrease that element's frequency and if it hits 0 you remove it from the collection. Example (off the top of my head so might contain errors):

Random rand = new Random();
int runs = k;
while(runs > 0 && !samples.isEmpty() ) {
  runs--;
  int index = rand.nextInt(samples.size());
  SampleElement<String> element = samples.get(index);

  System.out.println(element.getValue());

  element.decrementFrequency();
  if( element.getFrequency() <= 0 ) {
    samples.remove(index);
  }
}
Thomas
  • 87,414
  • 12
  • 119
  • 157
  • 1
    I am not entirely sure that your answer solves my problem. Let us consider the extreme case in which "A" has size 9 and "B" has size 1. I want to perform k=1 samples. The probability of picking A would be 9/10, while the probability of picking B is 1/10. In your solution I think that instead the probability is 1/2 and 1/2. – molfo Dec 11 '20 at 09:42
  • 1
    @molfo that's right although that requirement didn't make it into your question (or I missed it). What you could do though is sum all the frequencies and generate a random number within that range, e.g. `rand.nextInt(10)` for your example. Finally instead of using an index you iterate over the elements and sum their frequency until you hit the end or the sum is larger than or equal to that random number + 1 and pick that element (so if you get 9 you'd select B since B has A(9) + B(1) = 10 (random(9) + 1)). Of course you'd need adjust the upper boundary in each iteration. – Thomas Dec 11 '20 at 12:38
1

You can collect these two arrays into a map:

String[] elems = {"A", "B", "C", "D", "E"};
Integer[] sizes = {10, 5, 4, 7, 3};

Map<String, Integer> map = IntStream.range(0, elems.length).boxed()
        .collect(Collectors.toMap(i -> elems[i], i -> sizes[i]));

System.out.println(map); // {A=10, B=5, C=4, D=7, E=3}
0

Assuming that the lengths of these two arrays are the same, you can create a list of map entries containing pairs of elements from these arrays, and shuffle this list:

String[] elems = {"A", "B", "C", "D", "E"};
Integer[] sizes = {10, 5, 4, 7, 3};

List<Map.Entry<String, Integer>> bag = IntStream
        .range(0, elems.length)
        .mapToObj(i -> Map.of(elems[i], sizes[i]))
        .flatMap(map -> map.entrySet().stream())
        .collect(Collectors.toList());

System.out.println(bag); // [A=10, B=5, C=4, D=7, E=3]
Collections.shuffle(bag);
System.out.println(bag); // [D=7, C=4, E=3, A=10, B=5]

See also: How to sort an array with respect to another array if there are duplicates?

0

If all you want is get a random element from bag, you do not need to shuffle bag. You can use Random#nextInt(elems.length * sizes.length) to get a random int from 0 to elems.length * sizes.length - 1 and using this int as the index, you can get an element from bag.

Demo:

import java.util.ArrayList;
import java.util.Random;

public class Main {
    public static void main(String[] args) {
        String[] elems = { "A", "B", "C", "D", "E" };
        Integer[] sizes = { 10, 5, 4, 7, 3 };
        int k = 3;

        ArrayList<String> bag = new ArrayList<>();
        for (int i = 0; i < elems.length; i++) {
            for (int j = 0; j < sizes[i]; j++) {
                bag.add(elems[i]);
            }
        }

        Random random = new Random();
        int count = elems.length * sizes.length;
        for (int i = 0; i < k; i++) {
            System.out.println(bag.get(random.nextInt(count)));
        }
    }
}
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110