Python - CSV - All the permutations of each row of numbers into tuples

Question

I am very much a novice at Python, but learning. I have been tasked at work to take a CSV of data (2500 rows) in the following format (as we cant figure out how to do it in Excel):

 RefNumber      Reviewer 1  Reviewer 2  Reviewer 3  Reviewer 4  Reviewer 5
  9676/2            6           3           2
  0526/4            6           3           1           5           1
  1842/1            5           3           1           5   
  2693/3            5           5           1           2   
  2515/1            6           3           1           5           3
  2987/1            4           1           3
  3841/1            4           3           1 
  3402/1            4           3           1           5

And produce a CSV with each average of all the permutations of numbers that you could get from each row (minimum of 3).

i.e.

3841/1 above would produce the tuple of {4,3,1}, and an average of 2.7

3402/1 above would produce the tuples of {4,3,1}, {4,3,1,5}, {3,1,5},{4,1,5} etc with an average of 2.7, 3.3, 3, 3.3 etc.

I am wracking my brain trying to think of the best way of doing this, as I also need to know of each average, how many numbers in the tuple did it contain i.e. {4,3,1} would produce an average of 2.7 and the count of numbers of that tuple is 3.

Essentially what I want to produce is this:

RefNumber      Avg 1     Avg 2       Avg 3       Avg 4   Avg 5
  3841/1        2.7         
  3402/1        2.7       3.3           3         3.5

But I guess to show the count of the numbers in the tuple, I could run it 9 times (there is a maximum of 12 reviews) and just have each datasheet on its own tab.

I technically also need the standard deviation of each tuple and the range of scores, but this is already going wayyyyy past my expertise so I guess I can maybe drop that or do it manually somehow.

Any idea on where to start with this?

I think `subset` is more appropriate than `permuatation`. The average of a permutated set is constant. — handle, May 08 '17 at 13:40
Start with reading the data, e.g. with https://docs.python.org/3/library/csv.html — handle, May 08 '17 at 13:43
Would subsets work? I thought a subset would just produce a set of all the numbers on that row i.e. if the row contained 8 numbers, it would be a subset of 8 numbers whereas I would want all the combinations between 3 and 8? — Nicholas, May 08 '17 at 13:52
Ahh, yes, I dont want duplications i.e. {4,3,1} is the same as {1,3,4} so I only want one of them. Hmmmm, back to the drawing board :/ — Nicholas, May 08 '17 at 14:01
Exactly, that is one combination (set) but two permutations (different order of elements). — handle, May 08 '17 at 14:02

Eular · Accepted Answer · 2017-05-08T14:35:45.223

3

You can use csv module to read through csv and extract the data and the itertools module to get all the combinations. see if its doing the job. Also I left the average values as it is but I see you are just working with 1 decimal point which you can easily get by rounding off the results. Guess you can save the result now.

from itertools import combinations as cb 
import csv
with open("test.csv") as f:
    reader=csv.reader(f)
    next(reader, None)  # skip header
    data=[filter(None,i) for i in reader]

def avgg(x):
    ll=[float(i) for i in x[1:]] #take review no and convert to float
    n=len(ll)
    avg_list=[x[0]]  #start result list with ref no.
    for i in range(3,n+1):
        for j in cb(ll,i):
            # print(j)  #see the combination
            avg_list.append(sum(j)/i)
    return avg_list

for x in data:
    print(avgg(x))

edited May 08 '17 at 14:35

answered May 08 '17 at 14:28

Eular

1,707
4
26
50

Thank you ever so much, that works perfectly! I need to think of a way of knowing which numbers apply to how many numbers in the tuple, but I guess I can add some text to the for loop! :)..... honestly, cant thank you enough. I had planned to spend a week or two on this! – Nicholas May 08 '17 at 14:38
1

You're welcome, now I guess you can easily get the standard deviation and other result from the combinations. – Eular May 08 '17 at 14:48

score 1 · Answer 2 · answered May 08 '17 at 14:52

I upvoted the last answer, but I'd thought I show you an example that keeps everything in the DataFrame

data = """RefNumber, Reviewer 1, Reviewer 2,Reviewer 3,Reviewer 4,Reviewer 5
9676/2,6,3,2,,
0526/4,6,3,1,5,1
1842/1,5,3,1,5,
2693/3,5,5,1,2,
2515/1,6,3,1,5,3
2987/1,4,1,3,,
3841/1,4,3,1,,
3402/1,4,3,1,5,
"""

import pandas
import itertools
import StringIO
import numpy

buffer = StringIO.StringIO(data)
df = pandas.read_csv(buffer, index_col=0)

# EVERYTHING ABOVE IS MOSTLY SETUP CODE FOR THE EXAMPLE
def get_combos(items, lower_bound=3):
    """
    Return all combinations of values of size lower_bound
    for items
    """
    usable = items.dropna()
    combos = list()
    n_combos = range(lower_bound, len(usable) + 1)
    for r in n_combos:
        combos += list(itertools.combinations(usable, r))
    return combos

df['combos'] = df.apply(get_combos, axis=1)
df['means'] = df['combos'].map(lambda items: [numpy.mean(x) for x in items])

score 1 · Answer 3 · edited May 23 '17 at 11:47

1

Related Q&A regarding subset generation:

edited May 23 '17 at 11:47

Community

1
1

answered May 08 '17 at 15:04

handle

5,859
3
54
82

Thank you very much for linking :) – Nicholas May 08 '17 at 17:09

Python - CSV - All the permutations of each row of numbers into tuples

3 Answers3