0

I am shuffling an array of, say, 8760 numbers sorted by their respective values (from low to high) to generate a quasi-stochastic time series. However, I want higher values to have a higher chance of appearing within the first quarter and last of the resulting array and lower values within the second and third quarter. My questions are:

  1. Is there a way to manipulate the shuffle function so it works with custom probabilities or do I have to "do it myself" afterwards?
  2. Is there some other package I do not know yet which can do this?
  3. Am I possibly blind and overlooking another much easier way to do this?
a = np.array([0, 0, 0, 0, 0, ...
              1, 1, 1, ...
              ...
              14, 14, 14, 14, 14, 14])
a_shuff = random.shuffle(a)
# desired resultwould be something like 
a_shuff = [14, 14, 8, 12, ... 0, 4, 2, 6, 3, ... 13, 14, 9, 11, 12]

It may be important to note that each value has a different number of occurances within the array.

I hope that describes my problem well enough - I am new to both Python and Stackoverflow. I'm happy to answer any further questions on this matter.

SOLUTION

By sorting my values as suggested in the answers and applying increasing probability values to each of them along the axis (whereas sum(p) must equal unity), I was able to successfully use Numpy's Random Choice function. This may not be an answer to the question i asked, however it does the same thing (at least in this specific case):

#convert list to array (list was necessary previously)
v_time = np.empty(0)
for r in range(len(temp)):
    v_time = np.append(v_time, temp[r])

#sort values by desired probablity - this step may vary depending on desired #trend in shuffled data
arrayA = v_time[0::2]
arrayB = v_time[1::2]    
arrayB = np.flip(arrayB)
v_time = np.concatenate((arrayB, arrayA))

#create probability values for customizing your weights
p = np.linspace(0.01, 1, len(v_time))
p = p / sum(p)

#shuffle array
v_timeShuff = np.random.choice(v_time, v_time.size, False, p)
HK92
  • 13
  • 6
  • First of all consider using `numpy.random.shuffle` instead... why use a numpy array if you then convert it to a python list? That's going to be way slower than just using a plain python list from the beginning. Anyway there are functions to get a weighted sample (e.g. `numpy.random.choice`) but those wont give you a shuffle... Googling "weighted shuffling" I found [this page](http://nicky.vanforeest.com/probability/weightedRandomShuffling/weighted.html) which may help you out – Giacomo Alzetta Jul 31 '19 at 07:04
  • Note that usually in weighted shuffles you only want higher weights to be more likely to end up at the beginning. But you can start by randomly splitting your array in two, do a weighted shuffle of both halves, reverse one and concatenate them to get what you want or at least a good approximation of it (you can also not split completely randomly but ensure that exactly half of the "high weights" end up in each half...) – Giacomo Alzetta Jul 31 '19 at 07:06
  • @Giacomo Alzetta : Consider posting your comments as an answer. Next time, avoid answering questions in comments. – Peter O. Jul 31 '19 at 08:57
  • Possible duplicate of [Algorithm to shuffle an Array randomly based on different weights](https://stackoverflow.com/questions/29972712/algorithm-to-shuffle-an-array-randomly-based-on-different-weights) – Peter O. Jul 31 '19 at 08:59
  • Thanks a lot for your replies! They helped me understand my problem better and I learned that using numpy.random.choice with probabilities and without replacement solves my problem well enough. Will edit my post to append the solution. – HK92 Aug 01 '19 at 06:00
  • It's better to add solutions by posting an answer rather than editing your question. – Peter O. Aug 01 '19 at 10:16
  • Good to know, I will keep that in mind. – HK92 Aug 02 '19 at 11:32

1 Answers1

0

While most shuffle functions are uniform, several non uniform ones have been implemented. For example Rodrigo Agundez implemented the Elitist Shuffle which would work well in this case.

Another approach is to split into quantiles (being already sorted, this is very easy) and during the shuffling, draw at every step basing on a biased choice where higher quantiles have a higher probability.

Attersson
  • 4,755
  • 1
  • 15
  • 29