0

I have the following list of tuples:

list = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0, 'd'), (2435, 'x')]

I would like to calculate the mean of the first component of all tuples. I did the following:

s = []
for i in range(len(list)):
    a = list[0][i]
    if a =! 0:
        s.append(a)
    else:
        pass
mean = sum(s) / len(s)

and it works, but my question is whether there is any way to avoid using for loops? since I have a very large list of tuples and due to time calculation I need to find another way if that possible.

According to the above stated for loop method. How could I find the mean with regard to the wights? I mean, e.g. the last element in the list is (2435, 'x') and the number 2435 is very large in comparison to that one in (1, 'x') which is 1. Any ideas would be very appreciated. Thanks in advance.

Ananay Mital
  • 1,395
  • 1
  • 11
  • 16
Adam
  • 75
  • 1
  • 8
  • 1
    If this is really Python 2, not using `range` would help some. I don't think that you will be able to do things much faster than this (short of writing a C function that you can call from Python). You are going to need to extract the numbers from the tuples, so some sort of loop (even a loop hidden inside of a comprehension) is unavoidable. – John Coleman Mar 04 '21 at 13:19
  • No need to apologize. The last paragraph in your question suggests that your question is about more than calculating the mean. If so, my answer below might be worth the cost of translating your data into a pandas dataframe. Pandas makes it easy to e.g. calculate the mean of just some observations (e.g. those with an `'x'` in a certain column). – John Coleman Mar 04 '21 at 13:45

3 Answers3

3

The loop is unavoidable as you need to iterate over all the elements at least once as John describes.

However, you can use an iterator based approach to get rid of creating a list to save on space:

mean = sum(elt[0] for elt in lst)/len(lst)

Update: I realize you only need the mean of elements that are non-zero. You can modify your approach to not store the elements in the list.

total = 0
counts = 0
for elt in lst:
    if elt[0]:
        total += elt[0]
        counts += 1

mean = total/counts
Krishna Chaurasia
  • 8,924
  • 6
  • 22
  • 35
  • `mean0 = sum(map(lambda x: x[0], lst)) / len(lst)` - another approach without the explicit loop in code but a loop is present internally. – Krishna Chaurasia Mar 04 '21 at 13:27
  • many thanks for your answer. it works, but it takes the zeros with in. So I would like to get rid of zeros in here, any ideas? The complet length of the list is 7 and I have 2 components with zeros, so then the new length is 5 – Adam Mar 04 '21 at 13:39
  • Also, don't use `range()` based loops and use value based looping to directly iterate over values and you don't need an `else`. – Krishna Chaurasia Mar 04 '21 at 13:43
0

You do need a for loop, but you can use list comprehension to make it cleaner. Also, python standard library has a very nice statistics module that you can use for the calculation of the mean. As extra note, please, do not use list as a variable name, it can be confused with the type list.

from statistics import mean

mylist = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0,'d'), (2435, 'x')]

m = mean([item[0] for item in mylist if item[0] != 0])

print(m)
1001.2

In Python 2.7

items = [item[0] for item in mylist if item[0] != 0]
mean = sum(items)/len(items)
print(mean)
1001.2

Finish up by refactoring the list comprehension to show more meaningful variable names, for example items = [number for number, letter in mylist if number != 0]

alec_djinn
  • 10,104
  • 8
  • 46
  • 71
  • @alec_djinn, thank you for your answer. it works, but it takes the zeros with in. So I would like to get rid of zeros in here, any ideas? – Adam Mar 04 '21 at 13:37
  • @Amir I have updated the list comprehension to include a `number != 0` check. Try again with the updated code. – alec_djinn Mar 04 '21 at 13:38
0

A pandas approach:

import pandas as pd

tuples = [(120, 'x'), (1120, 'y'), (1330, 'x'), (0, 't'), (1, 'x'), (0, 'd'), (2435, 'x')]
df = pd.DataFrame(tuples)
df[0][df[0]!=0].mean() #1001.2

Careful timing would be needed to see if this is any better than what you are currently doing. The actual mean calculation should be faster, but the gain could well be negated by the cost of conversion.

John Coleman
  • 51,337
  • 7
  • 54
  • 119