0

I have a voting system for something where people rate things from one to ten. I am able to get the number of total votes, but not their distribution. I can, however, access the totals for each score (but not what score they are for).

For example, say two people voted 2 and three 4 then I would get [4, 12]. Humanly, it is possibly to work out what the votes were, however as there is a lot of data, I need an algorithm that is able to do it programmatically. To solve this humanly, I would know that there were 5 votes and this I would know that it was either one four and four thees or two twos and three fours. Because this isn't the best example, we can't tell which one it would be and so it this situation, the algorithm should spit out all of the solutions. With the real data, it is very rare that something like this occurs.

The input is in the form of an int representing the total number of votes cast, and a list of all of the vote subtotals. The output should be an array containing pars (x, y) where there are x number of y votes cast. If more than one solution is present, the algorithm should return all of them in an array.

EDIT: Here's some real data (17 people voted):

Dunkel      - 60 + 18 + 8  + 18 + 10 + 4  + 3 + 1     = 122
Satomi      - 20 + 14 + 24 + 12 + 3  + 4  + 3         = 80
Bottersnike - 16 + 28 + 5  + 8  + 6  + 4  + 4         = 71
Woon        - 40 + 36 + 8  + 21 + 5  + 16             = 126
Limelier    - 10 + 18 + 6  + 15 + 8  + 4  + 6         = 67
RandomGamer - 16 + 6  + 10 + 4  + 6  + 4  + 7         = 53
Pillar      - 10 + 8  + 21 + 6  + 15 + 4  + 9 + 4 + 2 = 79
EdgedPixel  - 8  + 28 + 12 + 4  + 18 + 2  + 2         = 74
Lock        - 20 + 24 + 7  + 18 + 10 + 8  + 6 + 2     = 95
Huri        - 10 + 8  + 7  + 6  + 15 + 20 + 3 + 2 + 3 = 74
Sean        - 18 + 32 + 8  + 5  + 4  + 9  + 2         = 78

And the answers (sorry about the different order):

Woon        - 4*10 + 4*9 + 1*8 + 3*7 + 0*6 + 1*5 + 4*4 + 0*3 + 0*2 + 0*1 = 126
Dunkel      - 6*10 + 2*9 + 1*8 + 0*7 + 3*6 + 2*5 + 1*4 + 1*3 + 0*2 + 1*1 = 122
Lock        - 2*10 + 0*9 + 3*8 + 1*7 + 3*6 + 2*5 + 2*4 + 2*3 + 0*2 + 2*1 = 95
Satomi      - 2*10 + 0*9 + 3*8 + 4*7 + 2*6 + 0*5 + 0*4 + 1*3 + 2*2 + 3*1 = 80
Pillar      - 1*10 + 0*9 + 1*8 + 3*7 + 1*6 + 3*5 + 1*4 + 3*3 + 2*2 + 2*1 = 79
Sean        - 0*10 + 0*9 + 4*8 + 0*7 + 3*6 + 1*5 + 2*4 + 3*3 + 2*2 + 2*1 = 78
EdgedPixel  - 0*10 + 0*9 + 1*8 + 4*7 + 2*6 + 0*5 + 1*4 + 6*3 + 1*2 + 2*1 = 74
Huri        - 1*10 + 0*9 + 1*8 + 1*7 + 1*6 + 3*5 + 5*4 + 1*3 + 1*2 + 3*1 = 74
Bottersnike - 0*10 + 0*9 + 2*8 + 4*7 + 0*6 + 1*5 + 2*4 + 2*3 + 2*2 + 4*1 = 71
Limelier    - 1*10 + 2*9 + 0*8 + 0*7 + 1*6 + 3*5 + 2*4 + 0*3 + 2*2 + 6*1 = 67
RandomGamer - 0*10 + 0*9 + 2*8 + 0*7 + 1*6 + 2*5 + 1*4 + 2*3 + 2*2 + 7*1 = 53
  • When you have 12 how would you know whether 3 people voted "4" or 4 people voted "3"? – PM 77-1 Mar 23 '17 at 22:32
  • You wouldn't (it's not a very well put together example).. I also have the number of people that voted so you can use that to see which one fits. (Question edited to clarify) –  Mar 23 '17 at 22:33
  • What have you tried so far? Try to take the way you interpret the problem and define it with steps (you perform division with remainder, iterate through possible rates) – Uriel Mar 23 '17 at 22:45
  • My main problem is that I can run through the steps algorithmicly, but then I get stuck at what to do once I realise that the current values that I've been guessing are wrong. –  Mar 23 '17 at 22:50
  • Do you know if the subtotals are sorted? In [4, 12] for example, is it [n1*a, n2*b] where n1 and n2 are counts, and a, b are vote values, is it the case that a must be less than b? – Kenny Ostrom Mar 23 '17 at 22:59
  • Sadly, the subtotals are not sorted (as much as I would love that :p) –  Mar 23 '17 at 23:09
  • Are you unable to modify how your voting system stores the values? – Steven Summers Mar 24 '17 at 03:47

2 Answers2

0

I need an algorithm that is able to do it programmatically

I'd begin by factoring your totals.

What is the most efficient way of finding all the factors of a number in Python?

def factors(n):    
    return set(reduce(list.__add__, 
                ([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0)))

This will return all of the factors, very quickly, of a number n.

that would get you well on your way; obviously a listed total that doesn't have 5 for a factor did not come from a group that rated 5's

step 1) factor each number in your list of totals

step 2) create dictionary with all possible ratings as the keys; ie keys 1-10; each with empty lists as values.

step 3) check if each list of factors has a factor which matches each rating key; if so append the total it represents to the values list of possibilities for that respective key

from random import randint
from random import shuffle

VOTES = 200
RATINGS = 10

# Generate Test Data
ratings = {}
for i in range(1,(RATINGS+1)):
    ratings[i] = []
for i in range(VOTES):
    rate = randint(1,RATINGS)
    ratings[rate].append(rate)

subtotals = []
for key, values in ratings.iteritems():
   subtotals.append(sum(values))
subtotals = [i for i in subtotals if i != 0]
subtotals.sort()

hidden_subtotals = {}
for key, values in ratings.iteritems():
   hidden_subtotals[key] = sum(values)

print '==================================='
print 'Generate Hidden Test Data'
print 'unknown actual votes: %s' % ratings
print 'unknown actual subtotals: %s' % hidden_subtotals 
print '==================================='
print 'Present Problem'
print 'known vote total: %s' % VOTES
print 'known subtotals: %s' % subtotals
print 'known total: %s' % sum(subtotals)
print 'number of ratings used: %s of %s' % (len(subtotals), RATINGS)
print '==================================='

def factors(n):    
    f = list(set(reduce(list.__add__, ([i, n//i] for i in range(
        1, int(n**0.5) + 1) if n % i == 0))))
    f.sort()
    return f

# Factor Each
subtotal_factors = {}
for i in subtotals:
    subtotal_factors[i]=(factors(i))
print 'process 1: %s' % subtotal_factors

# Remove Factors greater than highest rating
possible_ratings = {}
for i in subtotal_factors:
   possible_ratings[i] = [
      z for z in subtotal_factors[i] if z<=RATINGS]
print 'process 2: %s' % possible_ratings

# Remove Factors if not enough votes for that possibility; too small relative to subtotal
for i in possible_ratings:
    possible_ratings[i] = [
        z for z in possible_ratings[i] if i/z < VOTES]
print 'process 3: %s' % possible_ratings

step 4) then you'll have a problem where 2's will fit in 4's, 6's, 8's, 10's; 4's will fit in 8's, and 3's will fit in 6's and 9's. You'll have to apply some logic to clean up the results.

Community
  • 1
  • 1
litepresence
  • 3,109
  • 1
  • 27
  • 35
  • I've managed to get to around step 2/3 but I'm a bit stumped on 4 and as how to implement it. –  Mar 23 '17 at 23:09
  • I'm on my phone at the moment, but you could probably generate some sample data quickly. E.g 15 people. Subtots: [14, 9, 8, 4, 12]. Votes: [two sevens, three threes, four twos, four ones, two sixes. I'll try and post my code as soon as I can, but that will probably be in the morning :( –  Mar 23 '17 at 23:14
  • Regarding "how to implement" I would begin with the assumption that the total for each score if sorted would very likely be in perfect order to begin with. ie biggest total = 10's; smallest total = 1's and all in perfectly stacked. From there you could trial and error some brute force and random shuffle; first changing 2, then 3, etc. until you get the total you need. I'd suspect you rarely have to change more than 3 from perfect order. – litepresence Mar 24 '17 at 02:48
0

I have now solved my problem. I used litepresence's code as a base, and then implemented what they described as step four. For the future, this is the code I used:

from random import randint
from random import shuffle
import itertools

NUM_RATINGS = 10
RATING = [10, 10, 10, 10, 10, 10, 9, 9, 8, 6, 6, 6, 5, 5, 4, 3, 1]
VOTES = len(RATING)


ratings = {}
for i in range(1, (NUM_RATINGS + 1)):
    ratings[i] = []
for i in RATING:
    ratings[i].append(i)


subtotals = []
for key, values in ratings.iteritems():
   subtotals.append(sum(values))
subtotals = [i for i in subtotals if i != 0]
subtotals.sort()

hidden_subtotals = {}
for key, values in ratings.iteritems():
   hidden_subtotals[key] = sum(values)

print '==================================='
print 'Hidden Test Data:'
print 'unknown actual votes: %s' % ratings
print 'unknown actual subtotals: %s' % hidden_subtotals 
print '==================================='
print 'Present Problem:'
print 'known vote total: %s' % VOTES
print 'known subtotals: %s' % subtotals
print 'known total: %s' % sum(subtotals)
print 'number of ratings used: %s of %s' % (len(subtotals), NUM_RATINGS)
print '==================================='

def factors(n):    
    f = list(set(reduce(list.__add__, ([i, n//i] for i in range(
        1, int(n**0.5) + 1) if n % i == 0))))
    f.sort()
    return f

# Factor Each
subtotal_factors = {}
for i in subtotals:
    subtotal_factors[i]=(factors(i))
print 'process 1: %s' % subtotal_factors

# Remove Factors greater than highest rating
possible_ratings = {}
for i in subtotal_factors:
   possible_ratings[i] = [
      z for z in subtotal_factors[i] if z<=NUM_RATINGS]
print 'process 2: %s' % possible_ratings

# Remove Factors if not enough votes for that possibility; too small relative to subtotal
for i in possible_ratings:
    possible_ratings[i] = [
        z for z in possible_ratings[i] if i/z < VOTES]
print 'process 3: %s' % possible_ratings


end_res = {}
other = {}  # (count, [poss])
for i in possible_ratings.items():
    if len(i[1]) == 1:
        end_res[i[0]] = (i[1][0], i[1][0])
    else:
        other[i[0]] = (subtotals.count(i[0]), i[1])

combs = {}
for i in other.items():
    combs[i[0]] = (i[1][0], [])  # (count, [data])
    for a in i[1][1]:
        for b in i[1][1]:
            if (a, b) not in combs[i[0]]:
                if a * b == i[0]:
                    combs[i[0]][1].append((a, b))

lists = []
for i in combs.items():
    for j in range(i[1][0]):
        lists.append([])
        for n in i[1][1]:
            lists[-1].append((n[0], n[1], i[0]))

toprocess = itertools.product(*lists)
working = []

for test in toprocess:
    works = True
    seen = []
    tot = 0
    for i in test:
        if i[1] in seen:
            works = False
            break
        tot += i[0]
        if tot > VOTES:
            works = False
            break
        seen.append(i[1])
    if not works:
        continue
    else:
        working.append(test)

formattedWs = []
for w in working:
    w = list(w)
    notseen = [i + 1 for i in range(NUM_RATINGS)]
    for i in w:
        notseen.remove(i[1])
    for i in notseen:
        w.append((0, i))

    t = ""
    def f(x, y):
        return y[1] - x[1]

    w.sort(cmp=f)

    for i in w:
        t += "%d*%d + " % (i[0], i[1])
    t = t[:-3]
    formattedWs.append(t)

seen = []
for w in list(formattedWs):
    if w in seen:
        formattedWs.remove(w)
    else:
        seen.append(w)

print "==================================="
for n, w in enumerate(formattedWs):
    print "Solution #%d: %s" % (n + 1, w)