24

I want to write this code as pythonic. My real array much bigger than this example.

( 5+10+20+3+2 ) / 5

print(np.mean(array,key=lambda x:x[1])) TypeError: mean() got an unexpected keyword argument 'key'

array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]

sum = 0
for i in range(len(array)):
    sum = sum + array[i][1]

average = sum / len(array)
print(average)

import numpy as np
print(np.mean(array,key=lambda x:x[1]))

How can avoid this? I want to use second example.

I'm using Python 3.7

ruohola
  • 21,987
  • 6
  • 62
  • 97
Sevval Kahraman
  • 1,185
  • 3
  • 10
  • 37

8 Answers8

28

If you are using Python 3.4 or above, you could use the statistics module:

from statistics import mean

average = mean(value[1] for value in array)

Or if you're using a version of Python older than 3.4:

average = sum(value[1] for value in array) / len(array)

These solutions both use a nice feature of Python called a generator expression. The loop

value[1] for value in array

creates a new sequence in a timely and memory efficient manner. See PEP 289 -- Generator Expressions.

If you're using Python 2, and you're summing integers, we will have integer division, which will truncate the result, e.g:

>>> 25 / 4
6

>>> 25 / float(4)
6.25

To ensure we don't have integer division we could set the starting value of sum to be the float value 0.0. However, this also means we have to make the generator expression explicit with parentheses, otherwise it's a syntax error, and it's less pretty, as noted in the comments:

average = sum((value[1] for value in array), 0.0) / len(array)

It's probably best to use fsum from the math module which will return a float:

from math import fsum

average = fsum(value[1] for value in array) / len(array)
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
  • I realised there are better ways to do the Python 2 code. `sum` takes an argument for the starting value. If you pass `0.0` to it, then the numerator will always be floating point, nothing to worry about. Also, there is a function in the [**`math`**](https://docs.python.org/2/library/math.html) module, [**`fsum`**](https://docs.python.org/2/library/math.html#math.fsum). – Peter Wood Apr 25 '19 at 08:00
  • 5
    I would say the `float` casting way is little bit more self-explanatory than passing a weird `0.0` value argument for the `sum`. – ruohola Apr 25 '19 at 08:55
  • @ruohola I think using `fsum` is probably best for Python 2. – Peter Wood Apr 25 '19 at 09:09
  • 1
    Can't you `from __future__ import division`? – DanielSank Apr 25 '19 at 20:55
  • @DanielSank yes, that's another option. Another advantage of using [**`fsum`**](https://docs.python.org/2/library/math.html#math.fsum), if you're summing floats, is it keeps track of partial sums, which compensates for lack of precision in the floating point representation. So, if we stay using `fsum` we don't need to think about integer division at all, and are generally the better solution too. See my answer about [Kahan Summation](https://stackoverflow.com/questions/10330002/sum-of-small-double-numbers-c/10330857#10330857) in [tag:c++]. – Peter Wood Apr 25 '19 at 21:41
  • I don't understand "However, this also means we have to make the loop over the values in the array into a comprehension expression"... do you mean adding the explicit parentheses that converts the expression into a genexp? Because the previous variants of your solution all use genexps, just without the explicit parens because that isn't needed if there are no other arguments to the function. – cs95 Apr 26 '19 at 04:46
  • @cs95 yes, that's what I meant. I've tried to improve the answer, thanks. – Peter Wood Apr 26 '19 at 06:43
3

If you do want to use numpy, cast it to a numpy.array and select the axis you want using numpy indexing:

import numpy as np

array = np.array([('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)])
print(array[:,1].astype(float).mean())
# 8.0

The cast to a numeric type is needed because the original array contains both strings and numbers and is therefore of type object. In this case you could use float or int, it makes no difference.

Graipher
  • 6,891
  • 27
  • 47
3

If you're open to more golf-like solutions, you can transpose your array with vanilla python, get a list of just the numbers, and calculate the mean with

sum(zip(*array)[1])/len(array)
Nick Amin
  • 31
  • 1
2

You can simply use:

print(sum(tup[1] for tup in array) / len(array))

Or for Python 2:

print(sum(tup[1] for tup in array) / float(len(array)))

Or little bit more concisely for Python 2:

from math import fsum

print(fsum(tup[1] for tup in array) / len(array))
ruohola
  • 21,987
  • 6
  • 62
  • 97
  • As it's python 3, just use [**`statistics.mean`**](https://docs.python.org/3/library/statistics.html#statistics.mean). – Peter Wood Apr 25 '19 at 11:09
2

With pure Python:

from operator import itemgetter

acc = 0
count = 0

for value in map(itemgetter(1), array):
    acc += value
    count += 1

mean = acc / count

An iterative approach can be preferable if your data cannot fit in memory as a list (since you said it was big). If it can, prefer a declarative approach:

data = [sub[1] for sub in array]
mean = sum(data) / len(data)

If you are open to using numpy, I find this cleaner:

a = np.array(array)

mean = a[:, 1].astype(int).mean()
gmds
  • 19,325
  • 4
  • 32
  • 58
2

you can use map instead of list comprehension

sum(map(lambda x:int(x[1]), array)) / len(array)

or functools.reduce (if you use Python2.X just reduce not functools.reduce)

import functools
functools.reduce(lambda acc, y: acc + y[1], array, 0) / len(array)
minji
  • 512
  • 4
  • 16
  • first one gives this error : 'int' object is not callable – Sevval Kahraman Apr 25 '19 at 07:46
  • @ŞevvalKahraman if array is defined as shown in your question - the first one give 8.0 (tested & verified on same version). So either the array your using has a different value somewhere or you made a typo – LinkBerest Apr 25 '19 at 12:06
  • `x[1]` is already an integer, why do you need to call `int()`? – Barmar Apr 25 '19 at 16:53
  • Using a lambda is 30% slower than a generator comprehension. But if you prefer `map`, I recommend using `operator.itemgetter(1)` instead of the lambda. – Mateen Ulhaq Apr 25 '19 at 22:52
  • Similarly, `functools.reduce` is 72% slower than a generator comprehension and `sum`. – Mateen Ulhaq Apr 25 '19 at 22:54
0

You could use map:

np.mean(list(map(lambda x: x[1], array)))

pdpino
  • 444
  • 4
  • 13
0

Just find the average using sum and number of elements of the list.

array = [('a', 5) , ('b', 10), ('c', 20), ('d', 3), ('e', 2)]
avg = float(sum(value[1] for value in array)) / float(len(array))
print(avg)
#8.0
Devesh Kumar Singh
  • 20,259
  • 5
  • 21
  • 40