Ok I have the following working program. It opens of a file of data in columns that is too large for excel and finds the average value for each column:
Sample data is:
Joe Sam Bob
1 2 3
2 1 3
And it returns
Joe Sam Bob
1.5 1.5 3
This is good. The problem is some columns have NA as a value. I want to skip this NA and calculate the average of the remaining values So
Bobby
1
NA
2
Should output as
Bobby
1.5
Here is my existing program built with help from here. Any help is appreciated!
with open('C://avy.txt', "rtU") as f:
columns = f.readline().strip().split(" ")
numRows = 0
sums = [0] * len(columns)
for line in f:
# Skip empty lines
if not line.strip():
continue
values = line.split(" ")
for i in xrange(len(values)):
sums[i] += int(values[i])
numRows += 1
with open('c://finished.txt', 'w') as ouf:
for index, summedRowValue in enumerate(sums):
print>>ouf, columns[index], 1.0 * summedRowValue / numRows
Now I have this:
with open('C://avy.txt', "rtU") as f:
def get_averages(f):
headers = f.readline().split()
ncols = len(headers)
sumx0 = [0] * ncols
sumx1 = [0.0] * ncols
lino = 1
for line in f:
lino += 1
values = line.split()
for colindex, x in enumerate(values):
if colindex >= ncols:
print >> sys.stderr, "Extra data %r in row %d, column %d" %(x, lino, colindex+1)
continue
try:
value = float(x)
except ValueError:
continue
sumx0[colindex] += 1
sumx1[colindex] += value
print headers
print sumx1
print sumx0
averages = [
total / count if count else None
for total, count in zip(sumx1, sumx0)
]
print averages
and it says:
Traceback (most recent call last): File "C:/avy10.py", line 11, in lino += 1 NameError: name 'lino' is not defined