Finding the averages from columns

Question

I'm using this txt file named Gradedata.txt and it looks like this:

Sarah K.,10,9,7,9,10,20,19,19,45,92
John M.,9,9,8,9,8,20,20,18,43,95
David R.,8,7,7,9,6,18,17,17,40,83
Joan A.,9,10,10,10,10,20,19,20,47,99
Nick J.,9,7,10,10,10,20,20,19,46,98
Vicki T.,7,7,8,9,9,17,18,19,44,88

I'm looking for the averages of each column. Each column has it's own title (Homework #1, Homework #2, etc. in that order). What I am trying to do should look exactly like this:

Homework #1        8.67
Homework #2        8.17
Homework #3        8.33
Homework #4        9.33
Homework #5        8.83
Quiz #1           19.17
Quiz #2           18.83
Quiz #3           18.67
Midterm #1        44.17
Final #1          92.50

Here is my attempt at accomplishing this task:

with open("GradeData.txt", "rtU") as f:
    columns = f.readline().strip().split(" ")
    numRows = 0
    sums = [0] * len(columns)

    for line in f:

        if not line.strip():
            continue

        values = line.split(" ")
        for i in xrange(len(values)):
            sums[i] += int(values[i])
        numRows += 1

    for index, summedRowValue in enumerate(sums):
        print columns[index], 1.0 * summedRowValue / numRows

I'm getting errors and also I realize I have to name each assignment average. Need some help here. I appreciate it.

Post the errors so people can guide you to the right path or you'll just get a lot of "we won't do your homework for you" :) — Jmills, Jan 28 '16 at 23:50
You are splitting on space, not commas. This file is in CSV format, so use CSV parser unless you're told to do otherwise. If you have trouble with that, edit question. — TNW, Jan 28 '16 at 23:52
Are you open to using an external library like [`pandas`](http://pandas.pydata.org/)? — OneCricketeer, Jan 28 '16 at 23:52
It's pretty simple once you get the datafile loaded into a matrix. http://stackoverflow.com/questions/31037298/pandas-get-column-average — OneCricketeer, Jan 28 '16 at 23:54

Padraic Cunningham · Answer 1 · 2016-01-29T00:04:05.947

Just transpose and use statistics.mean to get the average, skipping the first col:

import csv
from itertools import islice
from statistics import mean

with open("in.txt") as f:
    for col in islice(zip(*csv.reader(f)), 1, None):
        print(mean(map(float,col)))

Which will give you:

8.666666666666666
8.166666666666666
8.333333333333334
9.333333333333334
8.833333333333334
19.166666666666668
18.833333333333332
18.666666666666668
44.166666666666664
92.5

If the columns are actually named and you want to pair them:

import csv
from itertools import islice
from statistics import mean

with open("in.txt") as f:
    # get column names
    cols = next(f).split(",")
    for col in islice(zip(*csv.reader(f)),1 ,None):
        # keys are column names, values are averages
        data = dict(zip(cols[1:],mean(map(float,col))))

Or using pandas.read_csv:

import pandas as pd

df = pd.read_csv("in.txt",index_col=0,header=None)

print(df)
print(df.mean(axis=0))

          1   2   3   4   5   6   7   8   9   10
0                                               
Sarah K.  10   9   7   9  10  20  19  19  45  92
John M.    9   9   8   9   8  20  20  18  43  95
David R.   8   7   7   9   6  18  17  17  40  83
Joan A.    9  10  10  10  10  20  19  20  47  99
Nick J.    9   7  10  10  10  20  20  19  46  98
Vicki T.   7   7   8   9   9  17  18  19  44  88
1      8.666667
2      8.166667
3      8.333333
4      9.333333
5      8.833333
6     19.166667
7     18.833333
8     18.666667
9     44.166667
10    92.500000
dtype: float64

score 1 · Answer 2 · answered Jan 28 '16 at 23:57

1

numpy can chew this up in one line:

>>> np.loadtxt('Gradedata.txt', delimiter=',', usecols=range(1,11)).mean(axis=0)
array([  8.66666667,   8.16666667,   8.33333333,   9.33333333,
         8.83333333,  19.16666667,  18.83333333,  18.66666667,
        44.16666667,  92.5       ])

answered Jan 28 '16 at 23:57

wim

338,267
99
616
750

Just curious. How would I label each one of those (Homework #1, Homework#2, etc.)? – Tomas Jan 30 '16 at 22:26

Finding the averages from columns

2 Answers2