1

Problem: In total, I have 1-300 positions to fill in, I have 50 items, each item has 6 unique positions to choose (300 total positions). The average position for each of these items needs to be in range 145-155 (the closer to 150 the better)

Constraints: For each item, each of its 6 positions must fall within a list of fixed ranges (called runs), but not be within the same run,

e.g. The runs are: [(1,36), (37,73), (74,110), (111,148), (149,186), (187,225), (226,262), (263, 300)]. So item 1 can have position 1, 38, 158, 198, 238, 271 - so not within the same run twice.

The positions chosen should be random - or as random as possible

The issue I'm having: Constraining the randomly selected positions to be an average of 150 (or very close) seems to be really difficult. I've tried various different implementations of the code but most will result in it hanging close to the end (due to not having enough positions to choose from), or don't get very close. My best attempt just involves if statements I've placed to at least try to restrict the range but it still only gets a range of around 130-170. This feels like a really poor method and I was wondering if there was a way to do it algorithmically instead of just pasta'ing if statements in hopes something will work.

My best attempt with random band-aid restrictions: https://pyfiddle.io/fiddle/dfc6dfd1-1e12-46df-957c-d7ae0e94fbe3/?i=true

^ as you can see the averages vary, with most things being in an acceptable range and some just being off/really off

I've spent several weeks on this but genuinely cannot think of anything to restrict it to an appropriate average (145-155) and was hoping for any help here

  • This sounds really tough in general. `(the closer to 150 the better)` is probably *conflicting* with `The positions chosen should be random - or as random as possible`. They probably play against each other. So it's important to understand how to weight these objectives and what kind of randomness your target. With some work on top of these questions, you could easily build a constraint-programming model where var-selection and value-selection are done randomly. This will not guarantee uniform results, but might be enough for you.[Or more crazy](https://stackoverflow.com/a/47208133/2320035) – sascha Dec 27 '19 at 20:24
  • It's really tough, as without a formal model it's hard to satisfy the task. Even the simple `The positions chosen should be random - or as random as possible` has at least one very dramatic decision: *l1-norm or l2-norm based error. Worse: formalizing these things might have a strong effect on what's possible in theory on the solver-side (restricting some techniques). So it's a bit of a cycle. Maybe a use-case including details is easier to formulate – sascha Dec 27 '19 at 20:27
  • yes that's my main issue, the average position constraint conflicts with the random positioning, making it a very difficult task :( –  Dec 27 '19 at 20:47
  • 1
    Two questions ① the ranges/runs are fixed or what you've shown is just one example between many others (in case, how many others?) ② _"The positions chosen should be random"_ – what do you mean, a uniform random distribution? what else? – gboffi Dec 27 '19 at 21:30
  • 1. They are fixed (in my case) and will probably be up to my input before it runs if it's done for other cases 2. By random I just mean not like determined by the positions of other items, like using random in python to determine it (I hope that's accurate enough) –  Dec 27 '19 at 21:48
  • I still don't get your idea of randomness. I was interpreting a `combinatorial sampling problem`, but your comment makes no sense to me and the answers until now completely ignore this sampling (at least in terms of statistical-analysis). – sascha Dec 28 '19 at 00:10
  • Sorry not too sure how else to say it, maybe with context? The data is to be done to show images in a certain order, with each run being a run through of the exp. Randomness in the sense that no two people should get the same experiment and that the positions aren't like 1-to-1 (if item is first place in one run it shouldn't be guaranteed to be first place in another). –  Dec 28 '19 at 01:58

2 Answers2

0

Take an item, choose a random run and a random position inside that run. 'Put' the item in its place. The next run is non-random: take the 'mirrored' one. I mean if the first run was 0 - (1,36), then next is 7 - (263, 300). If the first is 5 - (187,225), then next is 2 - (74,110). The position in the 'mirrored' run can be random, but since run is pretty wide, I believe you should also divide run into sub-runs and mirror positions also.

import numpy as np
from random import randint

runs = np.array([(1,36), (37,73), (74,110), (111,148), (149,186), (187,225), (226,262), (263, 300)])

#number of runs
numOfRuns =  len(runs)
#number of subruns in a run
numberOfSubruns = 4
#maximum error when 'mirroring'
maxMirroringError = 0
#number of items
numOfItems = 50
#number of positions, must be even
numOfPos = 6
#50 items, 6 positions each
items=np.zeros((numOfItems,numOfPos))
for ii in range(numOfItems):
    #create an array containing "used" runs.
    usedruns=np.zeros(numOfRuns,dtype=bool)
    for jj in range(0,numOfPos,2):
        while True:
            #choose run index
            run_ind = randint(0,numOfRuns-1)
            #check if it is not used
            if not usedruns[run_ind]:
                break;
        #make run used for this item
        usedruns[run_ind] = True
        #choose position index
        pos_ind = randint(runs[run_ind][0],runs[run_ind][1])
        #store the result
        items[ii][jj] = pos_ind

        #we need this adjustment in the case of maxMirroringError!=0
        #or else we get an infinite loop
        adjuster=0
        while True:
            #find mirrored run
            mirrored_run_ind = numOfRuns - 1 - run_ind+ randint(-(maxMirroringError+adjuster),maxMirroringError+adjuster)
            #apply constraints for mirrored_run_ind to be in [0,numOfRuns)
            mirrored_run_ind = np.min([mirrored_run_ind,numOfRuns-1])
            mirrored_run_ind = np.max([mirrored_run_ind,0])
            adjuster +=1
            if not usedruns[mirrored_run_ind]:
                break;
        #make run used for this item
        usedruns[mirrored_run_ind] = True

        #get size of a particular subrun
        subrun_size = (runs[run_ind][1]-runs[run_ind][0]) // numberOfSubruns
        #find subrun's index
        subrun_ind = (pos_ind-runs[run_ind][0]) // subrun_size
        #find mirrored subrun's index
        mirrored_subrun_ind = numberOfSubruns - subrun_ind - 1
        #apply constraints for mirrored_run_ind to be in [0,numberOfSubruns)
        mirrored_subrun_ind = np.min([mirrored_subrun_ind,numberOfSubruns-1])
        mirrored_subrun_ind = np.max([mirrored_subrun_ind,0])
        #find mirrored pos
        #lower bound
        a = (runs[mirrored_run_ind][1]-runs[mirrored_run_ind][0]) * mirrored_subrun_ind // numberOfSubruns
        #upper bound is lower bound + mirrored subrun size
        b = (runs[mirrored_run_ind][1]-runs[mirrored_run_ind][0]) * (mirrored_subrun_ind + 1) // numberOfSubruns
        #index itself
        mirrored_pos_ind = runs[mirrored_run_ind][0] + randint(a,b)

        #store the result
        items[ii][jj+1] = mirrored_pos_ind
    #check for negatives
    if (items<0).any():
        print ('negative value encountered!')
        raise;
    #check for same runs
    storedruns=np.zeros(numOfRuns,dtype=bool)
    for pos in items[ii]:
        for kk in range(numOfRuns):
            if pos>=runs[kk][0] and pos<=runs[kk][1]:
                if storedruns[kk]:
                    print('Same runs for an item encountered!')
                    raise;
                else:
                    storedruns[kk] = kk;
    #print an average of position of an item
    # print(np.average(items[ii,:]))
print('\n')
# print a standard deviation of average of items
print(np.std(np.average(items,axis=1)))
# print('\n')
# print(items)

You can change numberOfSubruns & maxMirroringError and see what fits your case. Your case (total randomness) is numberOfSubruns=1, maxMirroringError = numOfRuns (equals 8). If you set numberOfSubruns = 4, maxMirroringError = 0, you will get average very close to 148. For your convenience, I put standard deviation calculation of averaged positions to the end.

Suthiro
  • 1,210
  • 1
  • 7
  • 18
  • New contributor Wow this is amazing, thank you! Upon running item and checking items however, there appear to be negative valued/repeated positions, which shouldn't be possible. I will try to read more into the code to fully understand it to see if it's an error in my understanding though –  Dec 27 '19 at 22:08
  • Thank you for the update! It appears this solution still has repeated positions for different items though, so unfortunately it doesn't work for me. Thanks again for all the help! –  Dec 29 '19 at 21:01
0

This approach uses dynamic programming to randomly extract as good a group as it can 50 times. It then throws the imperfect ones back in with a few good ones and tries the same thing. It keeps doing this while relaxing the definition of perfect until it gets 50 acceptable groups.

It is slow. (About 20 seconds on my laptop.) But it frequently gets all groups to have the exact same average. Across a number of runs I did not see a single group out of the range [150.0, 151.0].

#! /usr/bin/env python3

import math
import collections
import random

BaseNode = collections.namedtuple('BaseNode', 'sum value ways run prev_node')
class Node(BaseNode):
    def merge(self, other):
        if self.sum != other.sum:
            print(self)
            print(other)
            raise Exception('Can only merge nodes with the same sum.')
        ways = self.ways + other.ways
        if self.ways <= random.randint(1, ways):
            return Node(sum=self.sum, value=self.value, ways=ways,
                        run=self.run, prev_node=self.prev_node)
        else:
            return Node(sum=other.sum, value=other.value, ways=ways,
                        run=other.run, prev_node=other.prev_node)

    def extract_group(self):
        values = []
        node = self
        while node.value is not None:
            node.run.remove(node.value)
            values.append(node.value)
            node = node.prev_node
        return sorted(values)


def random_next_group_from_runs (runs):
    runs_by_count = {}
    # organize by count
    for run in runs:
        count = len(run)
        if count in runs_by_count:
            runs_by_count[count].append(run)
        else:
            runs_by_count[count] = [run]


    required_runs = []
    optional_runs = []
    largest = max(runs_by_count.keys())
    if 6 < len(runs_by_count[largest]):
        largest = largest + 1
    else:
        required_runs = runs_by_count[largest]

    for count, these_runs in runs_by_count.items():
        if count < largest:
            optional_runs.extend(these_runs)

    # We start with the empty sum.
    node_by_sum_by_count = {0: {0: Node(sum=0, value=None, ways=1, run=None, prev_node=None)}}
    # We update to use a value from each required run.
    for run in required_runs:
        new_node_by_sum_by_count = {}
        for count, node_by_sum in node_by_sum_by_count.items():
            if count+1 not in new_node_by_sum_by_count:
                new_node_by_sum_by_count[count+1] = {}
            new_node_by_sum = new_node_by_sum_by_count[count+1]
            for s, node in node_by_sum.items():
                for i in run:
                    new_node = Node(sum=s+i, value=i, ways=node.ways, run=run, prev_node=node)
                    if s+i not in new_node_by_sum:
                        new_node_by_sum[s+i] = new_node
                    else:
                        # This merge hides the random choice of which one to take.
                        new_node_by_sum[s+i] = new_node_by_sum[s+i].merge(new_node)
        node_by_sum_by_count = new_node_by_sum_by_count
    # We may use a value from each optional run.
    for run in optional_runs:
        new_node_by_sum_by_count = {}
        for count, node_by_sum in node_by_sum_by_count.items():
            # The options where we do not use this run.
            if count not in new_node_by_sum_by_count:
                new_node_by_sum_by_count[count] = {}
            new_node_by_sum = new_node_by_sum_by_count[count]
            for s, node in node_by_sum.items():
                if s not in new_node_by_sum:
                    new_node_by_sum[s] = node
                else:
                    new_node_by_sum[s] = new_node_by_sum[s].merge(node)


            # The options where we do use this run.
            if count+1 not in new_node_by_sum_by_count:
                new_node_by_sum_by_count[count+1] = {}
            new_node_by_sum = new_node_by_sum_by_count[count+1]
            for s, node in node_by_sum.items():
                for i in run:
                    new_node = Node(sum=s+i, value=i, ways=node.ways, run=run, prev_node=node)
                    if s+i not in new_node_by_sum:
                        new_node_by_sum[s+i] = new_node
                    else:
                        # This merge hides the random choice of which one to take.
                        new_node_by_sum[s+i] = new_node_by_sum[s+i].merge(new_node)
        node_by_sum_by_count = new_node_by_sum_by_count

    # Widening sums close to 903
    avg = 903
    def try_sum():
        yield avg
        i = 1
        while True:
            yield avg - i
            yield avg + i
            i = i+1

    # We only want groups with 6 elements.
    node_by_sum = node_by_sum_by_count[6]
    for i in try_sum():
        if i in node_by_sum:
            return node_by_sum[i].extract_group()


runs = [
    set(range(1, 37)),
    set(range(37, 74)),
    set(range(74, 111)),
    set(range(111, 149)),
    set(range(149, 187)),
    set(range(187, 226)),
    set(range(226, 263)),
    set(range(263, 301)),
]
in_run = {}
for i in range(len(runs)):
    for j in runs[i]:
        in_run[j] = i
#runs = [ set(range(i*36+1, i*36+37)) for i in range(8) ]
groups = []
bad_groups = []
attempt = 0
while attempt == 0 or 0 < len(bad_groups):
    attempt = attempt + 1
    # We add a few groups to bad groups in principle.
    for i in range(attempt):
        if 20 < len(groups):
            bad_groups.append(groups.pop())
    for group in bad_groups:
        for i in group:
            runs[in_run[i]].add(i)
    bad_groups = []
    while len(groups) + len(bad_groups) < 50:
        group = random_next_group_from_runs(runs)
        if abs(sum(group) - 903) <= math.floor(attempt/5.0):
            groups.append(group)
        else:
            bad_groups.append(group)
    random.shuffle(groups)

for group in groups:
    print([sum(group), group])
btilly
  • 43,296
  • 3
  • 59
  • 88
  • Thank you so much! This looks.. very complicated, so it might take me a very long time to understand, but I can't actually run it as it errors out on a key error for `node_by_sum_by_count` –  Dec 28 '19 at 00:09
  • 1
    @Helpthisamatuerprogrammer Oops, my bad. Cut and paste error. See the updated version. – btilly Dec 28 '19 at 00:43
  • This is actually amazing, thank you so much! Honestly this is so beyond my level of programming, I would've never thought of this. As a side question [not sure if allowed here], do you have any recommendations on the best way learn dynamic programming or develop my skill? Thank you again literally made my year haha –  Dec 28 '19 at 01:51
  • @Helpthisamatuerprogrammer For developing my skill, no books were as eye-opening as *Code Complete* and *The Structure and Interpretation of Computer Programming*. For dynamic programming, just do a few examples. The entire "linked list cookie crumbs" through a dynamic programming solution is one I came up with and haven't seen anyone else do. – btilly Dec 28 '19 at 03:14
  • 2
    I have changed `avg`, first I set `avg=904` and later `avg=902` and in both cases the program did not return in a reasonable amount of time (I let it run for over 8' before interrupting, while for `avg=903` it took <25"). Is there a reason for this behaviour? – gboffi Dec 28 '19 at 11:43
  • @gboffi Yes. It is very good at finding perfect matches. What happens is therefore that you are forcing it to find as many 904s as it can, and then forcing all of the excess into a handful of really bad ones. And it will then spend forever trying things over and over again before it is willing to actually accept a really bad one. And the further it goes, the more it tries to throw things back in the mix to have a chance of finding a good match, and you throw it down the same rabbit hole. – btilly Dec 28 '19 at 19:01
  • 1
    Odds are that if you let it run, you'd wind up with something like 49 at 904, and 1 at 854...after 245 passes. It will probably take an hour or so to get there though. – btilly Dec 28 '19 at 19:04
  • Sorry for bothering you again, I was looking at gboffi's comment and was wondering how you came upon the number of 903 then, random testing? –  Dec 31 '19 at 00:48
  • (Sorry wasn't able to edit my comment, but continues:), There seems to be a few sort of hard coded numbers and I was wondering how you decided upon them, like the line 146 `if 20 < len(groups):` –  Dec 31 '19 at 00:56
  • 1
    @Helpthisamatuerprogrammer 903 is `sum([1..300])/50` so is what you want the average group average to be. The `20` in that line is just a hack to keep from throwing EVERYTHING back in the pot for a recalculation run. The actual number doesn't matter much because you generally only hit it on the first attempt. – btilly Dec 31 '19 at 01:32
  • ah okay, that makes sense, thank you! I was actually wondering how you decided on specifically 903 vs 904 or 902, guess and check with timing? –  Dec 31 '19 at 02:14
  • @Helpthisamatuerprogrammer I figured out what was ideal, tried once, found that it *almost* worked. I consistently had a few outliers. So I did the recalculate hack and it worked. – btilly Dec 31 '19 at 18:41