1

I'm using this txt file named Gradedata.txt and it looks like this:

Sarah K.,10,9,7,9,10,20,19,19,45,92
John M.,9,9,8,9,8,20,20,18,43,95
David R.,8,7,7,9,6,18,17,17,40,83
Joan A.,9,10,10,10,10,20,19,20,47,99
Nick J.,9,7,10,10,10,20,20,19,46,98
Vicki T.,7,7,8,9,9,17,18,19,44,88

I'm looking for the averages of each column. Each column has it's own title (Homework #1, Homework #2, etc. in that order). What I am trying to do should look exactly like this:

Homework #1        8.67
Homework #2        8.17
Homework #3        8.33
Homework #4        9.33
Homework #5        8.83
Quiz #1           19.17
Quiz #2           18.83
Quiz #3           18.67
Midterm #1        44.17
Final #1          92.50

Here is my attempt at accomplishing this task:

with open("GradeData.txt", "rtU") as f:
    columns = f.readline().strip().split(" ")
    numRows = 0
    sums = [0] * len(columns)

    for line in f:

        if not line.strip():
            continue

        values = line.split(" ")
        for i in xrange(len(values)):
            sums[i] += int(values[i])
        numRows += 1

    for index, summedRowValue in enumerate(sums):
        print columns[index], 1.0 * summedRowValue / numRows

I'm getting errors and also I realize I have to name each assignment average. Need some help here. I appreciate it.

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
Tomas
  • 115
  • 1
  • 2
  • 12
  • Post the errors so people can guide you to the right path or you'll just get a lot of "we won't do your homework for you" :) – Jmills Jan 28 '16 at 23:50
  • You are splitting on space, not commas. This file is in CSV format, so use CSV parser unless you're told to do otherwise. If you have trouble with that, edit question. – TNW Jan 28 '16 at 23:52
  • Are you open to using an external library like [`pandas`](http://pandas.pydata.org/)? – OneCricketeer Jan 28 '16 at 23:52
  • No I am not. I'm guessing I should be? – Tomas Jan 28 '16 at 23:53
  • It's pretty simple once you get the datafile loaded into a matrix. http://stackoverflow.com/questions/31037298/pandas-get-column-average – OneCricketeer Jan 28 '16 at 23:54

2 Answers2

1

Just transpose and use statistics.mean to get the average, skipping the first col:

import csv
from itertools import islice
from statistics import mean

with open("in.txt") as f:
    for col in islice(zip(*csv.reader(f)), 1, None):
        print(mean(map(float,col)))

Which will give you:

8.666666666666666
8.166666666666666
8.333333333333334
9.333333333333334
8.833333333333334
19.166666666666668
18.833333333333332
18.666666666666668
44.166666666666664
92.5

If the columns are actually named and you want to pair them:

import csv
from itertools import islice
from statistics import mean

with open("in.txt") as f:
    # get column names
    cols = next(f).split(",")
    for col in islice(zip(*csv.reader(f)),1 ,None):
        # keys are column names, values are averages
        data = dict(zip(cols[1:],mean(map(float,col))))

Or using pandas.read_csv:

import pandas as pd

df = pd.read_csv("in.txt",index_col=0,header=None)

print(df)
print(df.mean(axis=0))

          1   2   3   4   5   6   7   8   9   10
0                                               
Sarah K.  10   9   7   9  10  20  19  19  45  92
John M.    9   9   8   9   8  20  20  18  43  95
David R.   8   7   7   9   6  18  17  17  40  83
Joan A.    9  10  10  10  10  20  19  20  47  99
Nick J.    9   7  10  10  10  20  20  19  46  98
Vicki T.   7   7   8   9   9  17  18  19  44  88
1      8.666667
2      8.166667
3      8.333333
4      9.333333
5      8.833333
6     19.166667
7     18.833333
8     18.666667
9     44.166667
10    92.500000
dtype: float64
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
1

numpy can chew this up in one line:

>>> np.loadtxt('Gradedata.txt', delimiter=',', usecols=range(1,11)).mean(axis=0)
array([  8.66666667,   8.16666667,   8.33333333,   9.33333333,
         8.83333333,  19.16666667,  18.83333333,  18.66666667,
        44.16666667,  92.5       ])
wim
  • 338,267
  • 99
  • 616
  • 750