2

I want to calculate the average value of several lists in python. These lists contain numbers as strings. Empty string isn't zero, it means a missing value.

The best I could come up with is this. Is there a more elegant, succinct & efficient way of writing this?

num    = ['1', '2', '', '6']
total  = sum([int(n) if n else 0 for n in num])
length = sum([1 if n else 0 for n in num])
ave    = float(total)/length if length > 0 else '-'

P.S. I'm using Python 2.7.x but recipes for Python 3.x are welcome

user
  • 17,781
  • 20
  • 98
  • 124

5 Answers5

6
num = ['1', '2', '', '6']
L = [int(n) for n in num if n]
ave = sum(L)/float(len(L)) if L else '-'

or

num = ['1', '2', '', '6']
L = [float(n) for n in num if n]
avg = sum(L)/len(L) if L else '-'
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
4

In Python 3.4 use the statistics library:

from statistics import mean
num = ['1', '2', '', '6']
ave = mean(int(n) for n in num if n)
Jan B. Kjeldsen
  • 17,817
  • 5
  • 32
  • 50
1

You can discard the square brackets. sum accepts generator expressions, too:

total  = sum(int(n) if n else 0 for n in num)
length = sum(1 if n else 0 for n in num)

And since generators yields the value only when needed, you save the expensive cost of storing a list in the memory. Especially if you're dealing with bigger datas.

aIKid
  • 26,968
  • 4
  • 39
  • 65
  • Is that more efficient? – user Jan 20 '14 at 09:13
  • *Far* more efficient when you're dealing with huge lists. – aIKid Jan 20 '14 at 09:14
  • So it's less memory intensive but not really faster, right? I feel since the list comprehension is being done twice, it's better to use them since list comprehension will stay in memory. – user Jan 20 '14 at 09:15
  • @buffer Why? That means you're storing two different list in memory. – aIKid Jan 20 '14 at 09:19
  • See this http://stackoverflow.com/questions/47789/generator-expressions-vs-list-comprehension & answer from gnibbler + senshin – user Jan 20 '14 at 09:23
1

Here's some timing on OP's solution vs. aIKid's solution vs. gnibbler's solutions, using a list of 100,000 numbers in 1..9 (plus the empty string) and 10 trials:

import timeit

setup = '''
from __main__ import f1, f2, f3, f4
import random


random.seed(0)
choices = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '']
num = [random.choice(choices) for _ in range(10**5)]
'''

def f1(num): # OP
    total  = sum([int(n) if n else 0 for n in num])
    length = sum([1 if n else 0 for n in num])
    ave    = float(total)/length if length > 0 else '-'
    return ave

def f2(num): # aIKid
    total = sum(int(n) if n else 0 for n in num)
    length = sum(1 if n else 0 for n in num)
    ave = float(total)/length if length > 0 else '-'
    return ave

def f3(num): # gnibbler 1
    L = [int(n) for n in num if n]
    ave = sum(L)/float(len(L)) if L else '-'
    return ave

def f4(num): # gnibbler 2
    L = [float(n) for n in num if n]
    ave = sum(L)/float(len(L)) if L else '-'
    return ave

number = 10
things = ['f1(num)', 'f2(num)', 'f3(num)', 'f4(num)']
for thing in things:
    print(thing, timeit.timeit(thing, setup=setup, number=number))

Result:

f1(num) 1.8177659461490339 # OP
f2(num) 2.0769015213241513 # aIKid
f3(num) 1.6350571199344595 # gnibbler 1
f4(num) 0.807052779158564  # gnibbler 2

It looks like gnibbler's solution using float is the fastest here.

senshin
  • 10,022
  • 7
  • 46
  • 59
  • Speed-Wise, i think so. The cost of creating a generator is generally more expensive than creating a list. Gen expressions would be more useful if you're handling bigger lists, as i mentioned in my answer. – aIKid Jan 20 '14 at 09:22
  • @alKid What counts as big here (order-of-magnitude-wise)? – senshin Jan 20 '14 at 09:23
  • Great comparison. If you add 'f4' with summation as float, it's surprisingly the fastest – user Jan 20 '14 at 09:31
0

A little different approach

num    = ['1', '2', '', '6']
total = reduce(lambda acc, x: float(acc) + (float(x) if x else 0),num,0)
length = reduce(lambda acc, x: float(acc) + (1 if x else 0),num,0)
average = (',',total/length)[length > 0]
brthornbury
  • 3,518
  • 1
  • 22
  • 21