0

I have been trying to think this thru but haven't been able to get a clean solution. So, I have a a list of lists like..

data = [ 
 [1,2,3],
 ['a','b'],
 ['fush', 'bush', 'mish', 'bish']
]

And I want to sample "k" values out of this.. But in order. So for example.. if k = 2, then it can return something like [2, 'b'] (And remove that from the consideration).

if k = 4, then it should return something like [3, 'a','bush', 1].

frazman
  • 32,081
  • 75
  • 184
  • 269
  • @AnagramDatagram No k is not an index to a list.. but k is an input to function basically suggesting how many values to "sample". if "k" is 4, then in wraps back to beginning of the list – frazman May 24 '19 at 05:46
  • "And remove that from the consideration" what does this mean? Remove this from `data`? – cs95 May 24 '19 at 05:47
  • @cs95 basically, say if k=4.. In first round, say we have sampled 2 from the list. Then the second round when we sample from the list, we sample from [1,3] (and 2 is removed from consideration). My bad if it was not clear. – frazman May 24 '19 at 05:48
  • So you want to remove random element from 1st sublist, then 2nd and 3rd and so on...is it correct? – Sociopath May 24 '19 at 05:48
  • @AkshayNevrekar Yepp.. as we sample.. we remove it from future considerations. – frazman May 24 '19 at 05:49
  • 1
    So k is the number of elements to sample, but how to determine which elements to sample? In your example for k=4, why is 3 and 'a' sampled and not any other value? – Bhavin May 24 '19 at 05:49
  • @Bhavin: I have probability distributions but for sake of simplicity lets say we randomly sample :) – frazman May 24 '19 at 05:50
  • Strange requirement, do you care that middle lists will empty first? i.e. what do you mean by random? Edit just reread question - are you always taking from the start? – Andrew Allen May 24 '19 at 05:54
  • @Fraz Does the output order matter? – gmds May 24 '19 at 05:55
  • @gmds yeah. it would be ideal ifs it ordered. :-/ – frazman May 24 '19 at 05:56
  • @Fraz Okay, I edited my answer to provide ordering. – gmds May 24 '19 at 05:56

8 Answers8

2

How about this?

import random

from itertools import chain, zip_longest

def special_sample(data, n):
    length = len(data)
    n_elements, excess = divmod(n, length)
    samples = (random.sample(sub, n_elements + 1) 
               if index < excess 
               else random.sample(sub, n_elements)
               for index, sub in enumerate(data))
    return [element for element in chain.from_iterable(zip_longest(*samples)) if element is not None]

special_sample(data, 4)

Output:

[3, 'a', 'bush', 1]
gmds
  • 19,325
  • 4
  • 32
  • 58
1

You can use random.shuffle to shuffle each sub-list in data first, zip and chain the sub-lists, and use itertools.islice to get the first k items:

import random
from itertools import islice, chain
k = 4
for l in data:
    random.shuffle(l)
print(list(islice(chain.from_iterable(zip(*data)), k)))

Sample output:

[1, 'a', 'mish', 3]
blhsing
  • 91,368
  • 6
  • 71
  • 106
1

You can try this Note: I have assumed that you want to remove the first element of the list every time, you can replace it with you random index

data = [
 [1,2,3],
 ['a','b'],
 ['fush', 'bush', 'mish', 'bish']
]

def sampleList(k, data):
  sampledList = []
  dl = len(data)
  for idx in range(0,k):
    # assuming here that we sample the first element of list always
    d = data[idx % dl] # wrap around the index
    sampledList.append(d[0]) # Add sampled value to return list
    del d[0] # Delete sampled value from original list

  return sampledList

print sampleList(2, data)
print data

print sampleList(4, data)
print data

Output of this is

[1, 'a']
[[2, 3], ['b'], ['fush', 'bush', 'mish', 'bish']]
[2, 'b', 'fush', 3]
[[], [], ['bush', 'mish', 'bish']]

Hope this helps.

Bhavin
  • 206
  • 2
  • 9
1

Again another approach. You would first have to flatten your list of lists once for all, i.e.

flat_data = [item for sublist in data for item in sublist]

and then fill another list until your k-based sampling is complete.

import random as rd
k      = 4
sample = []

while len(sample) < k:
    if rd.random() > .5:
        rd.shuffle(flat_data) # costly
        sample.append(
            flat_data.pop(0)
        )
# where sample now is, say, ['b', 'bish', 2, 'a']
keepAlive
  • 6,369
  • 5
  • 24
  • 39
0

You can also do:

import random

def fun(data, k):
  output = []
  for i in range(k):
      if i > len(data):
        # if i is greater than len of data then reset i
        i = i % len(data)  

      # select a random element from sublist and remove it.
      x = random.choice(data[i])
      output.append(x)
      data[i].remove(x)
  return output

print(fun(data, 3))

Output:

[3, 'b', 'bish']

# data
# [[1, 2], ['a'], ['fush', 'bush', 'mish']]
Sociopath
  • 13,068
  • 19
  • 47
  • 75
0

If you make a generator that creates a shuffle, spits out the random values and then shuffles again you can cycle over those generators to continually spit out values in the correct order for ever. Each time it runs out it will reshuffle:

from itertools import cycle, chain

def randGen(l):
    while True:
        r = random.sample(l, k=len(l))
        yield from r

data = [ [1,2,3],['a','b'],['fush', 'bush', 'mish', 'bish']]

gs = map(next, cycle(randGen(l) for l in data)) # setup a generator on the cycle 

for i in range(30):
    print(next(gs), end = ",")

Result

1,b,bush,3,a,bish,2,b,mish,3,a,fush,1,a,fush,2,b,bish,1,a,mish,3,b,bush,2,a,bish,1,b,mish ...

If you just want a certain number in a list, islice() makes it very convenient:

list(islice(gs, 9))

# [2, 'a', 'mish', 1, 'b', 'bish', 3, 'b', 'fush']
Mark
  • 90,562
  • 7
  • 108
  • 148
0

Okay this answer might be late, and probably a bit inefficient, but I decided to give it a go anyway:

import random
data = [
 [1, 2, 3],
 ['a', 'b'],
 ['fush', 'bush', 'mish', 'bish']
]
k = 5
sample_list = []


def filter_chosen_element(sample_list1, data1):
    for i in range(len(data1)):
        for j in range(len(sample_list1)):
            if sample_list1[j] in data1[i]:
                data1[i].remove(sample_list1[j])


if k <= len(data):
    for i in range(k):
        sample_list.append(random.choice(data[i]))
        filter_chosen_element(sample_list, data)
else:
    for i in range(k):
        sample_list.append(random.choice(data[i % 3]))
        filter_chosen_element(sample_list, data)

print(sample_list)
print(data)
0

You can create a permutation with the indexes of the lists, create a matrix whose columns are these indexes and flatten the matrix out, giving you the indexes of the array in the order that you will use them.

As you have some lists that are longer than others, limit the indexes length according to the shortest one.

def sample_from_matrix(data, k):
    min_size = min([len(i) for i in data])
    indexes = np.column_stack ( [ np.random.permutation(min_size) for i in data ] )
    indexes = indexes.flatten()
    return [ data[i % len(data)][indexes[i]] for i in range(min(k, len(indexes))) ]

pedrolucas
  • 53
  • 2