You can measure the performance of the different proposals. I am assuming than along the columns means that is row-wise. For instance if you have 1000 lists of 100 elements each at the end you are going to have a list with 100 averages.
import random
import numpy as np
import statistics
import timeit
data = [[random.random() for _ in range(100)] for _ in range(1000)]
def average(data):
return np.average(data, axis=0)
def sum_len(data):
return [sum(l) / len(l) for l in zip(*data)]
def mean(data):
return [statistics.mean(l) for l in zip(*data)]
if __name__ == "__main__":
print(timeit.timeit('average(data)', 'from __main__ import data,average', number=10))
print(timeit.timeit('sum_len(data)', 'from __main__ import data,sum_len', number=10))
print(timeit.timeit('mean(data)', 'from __main__ import data,mean', number=10))
Output
0.025441123012569733
0.029354612997849472
1.0484535950090503
It appears that statistics.mean
is considerable slower (about 35 times slower) than np.average
and the sum_len
method and than np.average
is marginally faster than sum_len
.