-3

I want to make an algorithm that distributes total number of entities into different categories using some ideal percentage. Lets say for example, that first category should contain 50.3% of all entities, second - 34.3%, third - 15.4%.

In some ideal conditions everything is good. I'm easily calculating desired value of entity for every category and then using some algorithm similar to this to maintain the right sum. Small example:

All categories are empty
We are adding 100
To the first 50
To the second = 34
To the third = 16 (fixed by current algorithm, was 15)

BUT, at the some point category can already contain some entities which are not distributed according our ideal percentage! And I do not know what algorithm I should use in this case. This algorithm should distribute new entities using these rules:

  1. After adding we should be as close as possible to ideal percentage. Deviations between ideal and actual percentage should be minimal and close to each other.
  2. Algorithm CANT delete existing entities from any category. It should only distribute the new ones.

Example:

At start:
First is empty
Second contains 10
Third contains 40

We are adding 50
To the first 38
To the second 12
To the third 0 (too much entities here before adding)

Result:
First = 38; deviation = 12.3%
Second = 22; deviation = 12.3%
Third = 40; deviation = 24.6%

Do not ask me how I got 38 and 12, I've just tried different combinations until it seems right. Any ideas about algorithm?

Community
  • 1
  • 1

1 Answers1

0

Following approach might work. Let us say you maintain 3 lists, 1 for each categories and a running avg (i.e current avg of the list) and total elements. The additional 2 elements are needed to make sure the complexity of adding elements remains constant.

Data Structure:

category_list {
    items : []
    average : float
    count : int
}

category_list categories[3]; # 1 for each category
avg_distribution[3]; # this will hold the distribution allowed for each category

Algorithm:

addItems(int items[]) {
   for item in items:
     category = getCategory(item)
     insertItem(category, item)
}

# This algo will identify the category to be inserted in constant time or the no.of categories available. 
# For this scenario we can say it runs in O(1)
int getCategory(int item) {
   minAvg = INT_MAX
   minCat = 0
   for cat in range(3):
      category = categories[cat]
      newAvg = (category.avg*(category.count)+item.value) / (category.count+1)
      newAvg = newAvg - avg_distribution[cat] 
      # this is to make sure we consider only the deviation part and not the entire avg.
      if newAvg < minAvg:
         minAvg = minAvg
         minCat = cat 

   return minCat
}

# This will insert item in a given category in O(1)
insertItem(category, item) {
    category.items[category.count] = item.value
    category.avg = (category.avg*(category.count)+item.value) / (category.count+1)
    category.count++;
}

# This method is need to initially load all the elements in given category
loadAllItems(int items[], int category) {
    category = categories[category]
    for item in items:
       insertItem(category, item)
}

Hope it help!

arunk2
  • 2,246
  • 3
  • 23
  • 35
  • Our items do not contain any value so we cant calculate category average. BUT I believe this algorithm is what we need. We'll replace catAvg with something like "current category percentage" and will try to implement this. Thanks for answer. – Bogdan Gn. May 04 '17 at 16:19