How do I find the mean of data that has been entered in a csv file from python?

Question

I want to find the average of three columns in a csv file created from Python. My data is laid out like so:

[User Name, Score1, Score2, Score3]

for example

['James', 5, 8, 9]

I would like to find the average of the scores, 5, 8, 9.

newrecord = "{user_name},{score_1},{score_2},{score_3}\n".format(user_name=userName, score_1=quiz_scores[0], score_2=quiz_scores[1], score_3=quiz_scores[2])
file=open('classroom1.csv', "a+")
file.write(newrecord)
file.close()
with open('classroom1.csv') as csvfile:
    readCSV = csv.reader(csvfile)

I don't know what to do after this. Thanks in advance for any feedback.

Can you please be a little more specific? Do you mean the average of the values in each row, or the average of each of the columns, or the average for all the columns combined? — timgeb, Dec 26 '15 at 21:37
You could do it easily with `pandas`. Do you have installed that package? — Anton Protopopov, Dec 26 '15 at 21:38
@timgeb: given the one example, I'd say it is the average for the 3 columns per name. — Martijn Pieters, Dec 26 '15 at 21:38
@AntonProtopopov: this is homework (the UK GCSE exam). Pandas is way overkill. — Martijn Pieters, Dec 26 '15 at 21:38
@timgeb The average of all the numerical columns combined, so [James,5,8,9] 5+8+9=22 22/3=7.33333 — Bob Stanley, Dec 26 '15 at 21:38

Martijn Pieters · Answer 1 · 2015-12-26T21:58:07.913

2

Your readCSV object, when iterated over, will give you lists with strings, 4 values per row. Convert all but the first column to integers, then do your calculations on those integers:

for row in readCSV:
    name = row[0]
    scores = [int(c) for c in row[1:]]
    average = sum(scores) / len(scores)
    print('{}: {:.1f}'.format(name, average))

If you are using Python 2, then the / operator can cause problems as it'll use integer division when both the sum and the length are integers (which is the case here). Convert your numbers to float instead, or use 0.0 as the starting value for the sum (which makes sure the sum is a floating point number instead):

average = sum(scores, 0.0) / len(scores)

edited Dec 26 '15 at 21:58

answered Dec 26 '15 at 21:39

Martijn Pieters

1,048,767
296
4,058
3,343

@BobStanley: You can use `format()` or `str.format()` to format your result, including specifying the number of decimal places. – Martijn Pieters Dec 26 '15 at 21:57
Thanks so much for this, however how would I sort the data by average in ascending order. – Bob Stanley Dec 27 '15 at 19:43
@BobStanley: you do keep moving the goal posts! This post does answer your question. You can put the name and the average in a list, or add the average to the existing row. Build a new list with these rows and then sort those that list. – Martijn Pieters Dec 27 '15 at 19:53
srt=sorted(row[0],total,reverse=True) #Like this ? – Bob Stanley Dec 27 '15 at 20:01
@BobStanley: See [How to sort multidimensional array by column?](http://stackoverflow.com/q/20183069) – Martijn Pieters Dec 27 '15 at 20:19

score 1 · Answer 2 · answered Dec 26 '15 at 21:42

1

I propose a numpy solution.

Consider the mockup file testfile.txt with the content

James, 5, 8, 9
Jeff, 10, 7, 3
Alice, 6, 7, 1

We can use numpy.loadtxtto load your file, then simply map the average to each row.

>>> import numpy as np
>>> map(np.mean, np.loadtxt('testfile.txt', usecols=[1,2,3], delimiter=','))
[7.333333333333333, 6.666666666666667, 4.666666666666667]

answered Dec 26 '15 at 21:42

timgeb

76,762
20
123
145

Like `pandas`, numpy is way overkill for this problem — especially for anyone new to Python. – martineau Dec 27 '15 at 04:20

How do I find the mean of data that has been entered in a csv file from python?

2 Answers2