27

Lets say I have a list of words and i want to create a method which takes the size of the new list as a parameter and returns the new list. How can i get random words from my original sourceList?

public List<String> createList(int listSize) {
   Random rand = new Random();
   List<String> wordList = sourceWords.
      stream().
      limit(listSize).
      collect(Collectors.toList()); 

   return wordList;
}

So how and where can I use my Random?

Nathan
  • 8,093
  • 8
  • 50
  • 76
Adrian Krebs
  • 4,179
  • 5
  • 29
  • 44
  • 3
    Is there any reason to use a `Stream` ? Can't you shuffle the original list, and then return a copy with `subList` ? – Alexis C. Jul 29 '15 at 14:29
  • Do you know how many words are in the source stream? – dotvav Jul 29 '15 at 14:35
  • What @AlexisC. said. `Collections.shuffle` and `list.subList(0,size)` should be enough. – Pshemo Jul 29 '15 at 14:36
  • 1
    See also this answer, if there is a large number of elements which might make shuffling expensive: http://stackoverflow.com/a/28655112/1441122 – Stuart Marks Jul 29 '15 at 16:59
  • This question seems to be a duplicate of [*Perform operation on n random distinct elements from Collection using Streams API*](https://stackoverflow.com/q/28651908/642706). – Basil Bourque Apr 10 '21 at 22:47

11 Answers11

27

I've found a proper solution. Random provides a few methods to return a stream. For example ints(size) which creates a stream of random integers.

public List<String> createList(int listSize)
{
   Random rand = new Random();
   List<String> wordList = rand.
      ints(listSize, 0, sourceWords.size()).
      mapToObj(i -> sourceWords.get(i)).
      collect(Collectors.toList());

   return wordList;
}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Adrian Krebs
  • 4,179
  • 5
  • 29
  • 44
  • 6
    Using this solution you may get repeating words in the result which might be undesirable. If it's ok in your case, please edit the question to explicitly state this. – Tagir Valeev Jul 29 '15 at 17:24
  • Good point. But i could resolve that by adding .distinct right? – Adrian Krebs Jul 30 '15 at 09:48
  • 7
    This way your resulting list will be shorter than requested. Probably `rand.ints(0, sourceWords.size()).distinct().limit(listSize).mapToObj(sourceWords::get)` would be better. – Tagir Valeev Jul 30 '15 at 10:11
  • 1
    Unfortunately that doesn't work. `Limit()` is just ignored in this case. – Adrian Krebs Jul 30 '15 at 19:32
15

I think the most elegant way is to have a special collector.

I am pretty sure the only way you can guarantee that each item has an equal chance of being picked, is to collect, shuffle and re-stream. This can be easily done using built-in Collectors.collectingAndThen(...) helper.

Sorting by a random comparator or using randomized reducer, like suggested on some other answers, will result in very biased randomness.

List<String> wordList = sourceWords.stream()
  .collect(Collectors.collectingAndThen(Collectors.toList(), collected -> {
      Collections.shuffle(collected);
      return collected.stream();
  }))
  .limit(listSize)
  .collect(Collectors.toList());

You can move that shuffling collector to a helper function:

public class CollectorUtils {

    public static <T> Collector<T, ?, Stream<T>> toShuffledStream() {
        return Collectors.collectingAndThen(Collectors.toList(), collected -> {
            Collections.shuffle(collected);
            return collected.stream();
        });
    }

}

I assume that you are looking for a way to nicely integrate with other stream processing functions. So following straightforward solution is not what you are looking for :)

Collections.shuffle(wordList)
return wordList.subList(0, limitSize)
daShader
  • 243
  • 2
  • 7
4

This is my one line solution:

 List<String> st = Arrays.asList("aaaa","bbbb","cccc");
 st.stream().sorted((o1, o2) -> RandomUtils.nextInt(0, 2)-1).findFirst().get();

RandomUtils are from commons lang 3

kozla13
  • 1,854
  • 3
  • 23
  • 35
  • Improved version: `st.stream().min((o1, o2) -> o1 == o2 ? 0 : (ThreadLocalRandom.current().nextBoolean() ? -1 : 1)).orElseThrow();` 1. Used java built-in class ThreadLocalRandom 2. nextInt generates one from [-1, 0, 1] but 0 means equals for the elements and first element will be taken more cases than the element at farther positions. – IgorL Sep 14 '21 at 12:27
  • Feels like a lot of computation is required just to get a random item, since JVM has to fist sort the whole list according to the the randomised method, followed by picking up the first item. – Prasannjeet Singh Mar 18 '22 at 09:58
2

Here's a solution I came up with which seems to differ from all the other ones, so I figured why not add it to the pile.

Basically it works by using the same kind of trick as one iteration of Collections.shuffle each time you ask for the next element - pick a random element, swap that element with the first one in the list, move the pointer forwards. Could also do it with the pointer counting back from the end.

The caveat is that it does mutate the list you passed in, but I guess you could just take a copy as the first thing if you didn't like that. We were more interested in reducing redundant copies.

private static <T> Stream<T> randomStream(List<T> list)
{
    int characteristics = Spliterator.SIZED;
    // If you know your list is also unique / immutable / non-null
    //int characteristics = Spliterator.DISTINCT | Spliterator.IMMUTABLE | Spliterator.NONNULL | Spliterator.SIZED;
    Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(list.size(), characteristics)
    {
        private final Random random = new SecureRandom();
        private final int size = list.size();
        private int frontPointer = 0;

        @Override
        public boolean tryAdvance(Consumer<? super T> action)
        {
            if (frontPointer == size)
            {
                return false;
            }

            // Same logic as one iteration of Collections.shuffle, so people talking about it not being
            // fair randomness can take that up with the JDK project.
            int nextIndex = random.nextInt(size - frontPointer) + frontPointer;
            T nextItem = list.get(nextIndex);
            // Technically the value we end up putting into frontPointer
            // is never used again, but using swap anyway, for clarity.
            Collections.swap(list, nextIndex, frontPointer);

            frontPointer++;
            // All items from frontPointer onwards have not yet been chosen.

            action.accept(nextItem);
            return true;
        }
    };

    return StreamSupport.stream(spliterator, false);
}
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Hakanai
  • 12,010
  • 10
  • 62
  • 132
  • @Holger fair enough, it was pulled out of a private method we had where there was actually a guarantee of both those things. :) First time hearing about `Collections.swap` too. – Hakanai Jan 17 '18 at 23:39
  • @Holger I just figured out also why the bug doesn't matter. – Hakanai Jan 17 '18 at 23:55
0

Try something like that:

List<String> getSomeRandom(int size, List<String> sourceList) {
    List<String> copy = new ArrayList<String>(sourceList);
    Collections.shuffle(copy);
    List<String> result = new ArrayList<String>();
    for (int i = 0; i < size; i++) {
        result.add(copy.get(i));
    }

    return result;
}
mate00
  • 2,727
  • 5
  • 26
  • 34
  • 2
    It doesn't really use java 8 stream API – Toilal Jul 29 '15 at 14:47
  • @Toilal: still, it’s a solution. Not everything gets better with streams. However, if a rather small result list is required, I’d go into [this direction](http://stackoverflow.com/a/28671411/2711488) instead of shuffling the entire list… – Holger Jul 29 '15 at 18:05
  • I agree. When input list is considered big, shuffling it all is not efficient. – mate00 Jul 30 '15 at 07:52
0

The answer is very simple(with stream):

List<String> a = src.stream().sorted((o1, o2) -> {
        if (o1.equals(o2)) return 0;
        return (r.nextBoolean()) ? 1 : -1;
    }).limit(10).collect(Collectors.toList());

You can test it:

List<String> src = new ArrayList<String>();
for (int i = 0; i < 20; i++) {
    src.add(String.valueOf(i*10));
}
Random r = new Random();
List<String> a = src.stream().sorted((o1, o2) -> {
        if (o1.equals(o2)) return 0;
        return (r.nextBoolean()) ? 1 : -1;
    }).limit(10).collect(Collectors.toList());
System.out.println(a);
Timofey
  • 823
  • 1
  • 8
  • 26
  • 1
    r.nextBoolean() would be easier – Toilal Jul 29 '15 at 14:44
  • Yes, I just didn't think about it :) But the point still the same – Timofey Jul 29 '15 at 14:46
  • 2
    This uses a comparator which doesn't impose any definite ordering at all, thus breaking its general contract in the extremest sense. Even if it happens to work with some sorting implementation, it does so by pure accident. – Marko Topolnik Jul 29 '15 at 14:56
  • This is what @Swiss_Codeaholic wants — random order. I thought this is kind of a standart that callback should return positive number if one argument 'more than' another, and negative if the opposite. Am I wrong? – Timofey Jul 29 '15 at 15:04
  • 1
    The comparator should impose some _definite_ random order. Your comparator doesn't impose _any_ order at all. Actually, it doesn't even define a general relation because it returns a different value each time, even when repeatedly asked to compare the same two objects. An order must additionally be reflexive, symmetric, and transitive. – Marko Topolnik Jul 29 '15 at 15:13
  • But is it really possible to follow these rules with random order? If the comparator would return the same number each time when asked to compare the same objects, wouldn't it be random? – Timofey Jul 29 '15 at 15:19
  • 1
    Of course it is possible. A random order is just another order. A comparator which returns a random value doesn't specify any order at all. Take a list processed by `Collections.shuffle()`: for any two elements in that list, one comes before the other _every time you check_. And surely, an element compared to itself yields equality. Your comparator never returns zero. – Marko Topolnik Jul 29 '15 at 15:36
  • I have edited the answer, so now it can return 0, thanks. But I just checked `Collections.shuffle()` for `List src` from my example, and for any two elements every time they are in different order. Maybe I misunderstand you? – Timofey Jul 29 '15 at 15:57
  • Yes, you do. You called `shuffle()` again each time. I mentioned a list which was processed by `shuffle` once and then inspected. That list has a random, but definite order, and the equivalent of that is what OP needs. Your solution is like shuffling the whole list each time you call `iterator.next()` – Marko Topolnik Jul 29 '15 at 16:16
  • Now I got it. Thank you! – Timofey Jul 29 '15 at 16:17
  • 5
    Your solution is actually broken. It may work for small lists (less than 32 elements), because in this case JDK switches to simpler sorting algorithm. But for big arrays the sort will randomly crash because of contract violation. Try this source: `List src = IntStream.range(0, 100000).mapToObj(String::valueOf).collect(Collectors.toList());`. Use, for example, 0 seed (`Random r = new Random(0);`) and it will crash. – Tagir Valeev Jul 29 '15 at 17:33
  • didn't know that, thanks! Where can I find more info on stream api (beside oracle documentatiob)? – Timofey Jul 29 '15 at 17:36
  • 3
    @Timofey: This has nothing to do with the stream API. It’s the [`Comparator` contract](http://docs.oracle.com/javase/8/docs/api/java/util/Comparator.html). You can expect every method using a comparator to dislike broken contracts. I.e. `Collections.sort` behaves the same, even under Java 7. – Holger Jul 29 '15 at 18:01
0

If you want non repeated items in the result list and your initial list is immutable:

  • There isn't a direct way to get it from the current Streams API.
  • It's not possible to use a random Comparator because it's going to break the compare contract.

You can try something like:

public List<String> getStringList(final List<String> strings, final int size) {
    if (size < 1 || size > strings.size()) {
        throw new IllegalArgumentException("Out of range size.");
    }

    final List<String> stringList = new ArrayList<>(size);

    for (int i = 0; i < size; i++) {
        getRandomString(strings, stringList)
                .ifPresent(stringList::add);
    }

    return stringList;
}

private Optional<String> getRandomString(final List<String> stringList, final List<String> excludeStringList) {
    final List<String> filteredStringList = stringList.stream()
            .filter(c -> !excludeStringList.contains(c))
            .collect(toList());

    if (filteredStringList.isEmpty()) {
        return Optional.empty();
    }

    final int randomIndex = new Random().nextInt(filteredStringList.size());
    return Optional.of(filteredStringList.get(randomIndex));
}
0

@kozla13 improved version:

List<String> st = Arrays.asList("aaaa","bbbb","cccc");
st.stream().min((o1, o2) -> o1 == o2 ? 0 : (ThreadLocalRandom.current().nextBoolean() ? -1 : 1)).orElseThrow();
  1. Used java built-in class ThreadLocalRandom
  2. nextInt generates one from sequence [-1, 0, 1], but return 0 in compare func means equals for the elements and side effect of this - first element (o1) will be always taken in this case.
  3. properly handle object equals case
IgorL
  • 1,169
  • 10
  • 19
0

If the source list is generally much larger than the new list, you might gain some efficiencies by using a BitSet to get random indices:

List<String> createList3(int listSize, List<String> sourceList) {
  if (listSize > sourceList.size()) {
    throw new IllegalArgumentException("Not enough words in the source list.");
  }

  List<String> newWords = randomWords(listSize, sourceList);
  Collections.shuffle(newWords); // optional, for random order
  return newWords;
}

private List<String> randomWords(int listSize, List<String> sourceList) {
  int endExclusive = sourceList.size();
  BitSet indices = new BitSet(endExclusive);
  Random rand = new Random();
  while (indices.cardinality() < listSize) {
    indices.set(rand.nextInt(endExclusive));
  }
  
  return indices.stream().mapToObj(i -> sourceList.get(i))
    .collect(Collectors.toList());
}
0

A stream is probably overkill. Copy the source list so you're not creating side-effects, then give back a sublist of the shuffled copy.

public static List<String> createList(int listSize, List<String> sourceList) {
  if (listSize > sourceList.size()) {
    throw IllegalArgumentException("Not enough words for new list.");
  }
  List<String> copy = new ArrayList<>(sourceList);
  Collections.shuffle(copy);
  return copy.subList(0, listSize);
}
0

One liner to randomize a stream:

Stream.of(1, 2, 3, 4, 5).sorted(Comparator.comparingDouble(x -> Math.random()))
bedorlan
  • 55
  • 1
  • 5