3

My list looks like my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1]], ['B', 10, 7]]

I want to find the averages of the other two columns in each of the inner lists grouped by the first column in each of the inner list.

[['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]]

['A', 5, 7.5] comes from ['A', (6+4)/2 ,(7+8)/2]

I don't mind if I end up getting a dictionary or something, but I would prefer it remain a list.

I've tried the following:


  1. my_list1 = [i[0] for i in my_list] my_list2 = [i[1:] for i in my_list] new_dict = {k: v for k, v in zip(my_list1, my_list2)}

SPLITTING THE ORIGINAL LIST SO the first column becomes KEY, and the second and third columns becomes VALUE, and converting it to a dictionary will give you the aggregate but the problem is

I WANT TO TO PRESERVE THE DECIMAL PLACES, IT ROUNDS UP AND GIVES ME WHOLE NUMBERS INSTEAD OF FLOAT VALUES

my_list1 = ['A', 'A', 'B', 'C', 'B']

my_list2 = [[6, 7], [4, 8], [9, 3], [1, 1], [10, 7]]

new_dict= {'A': [5, 8], 'B': [10, 5], 'C': [1, 1]}

when what I would ideally want is, [['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]] (Don't mind if its a dictionary)


  1. Converted the second and third columns to float maybe using a for loop thinking, then it will give me a float when I convert it to a dictionary.. But no difference, IT ROUNDS UP and gives a A WHOLE NUMBER.

    for i in range(0, len(my_list)):
      for j in range(1, len(my_list[i])):
        my_list[i][j].astype(float)
    
    dict = {}
    
    for l2 in my_list:
      dict[l2[0]] = l2[1:]
    

The reason I need to preserve the decimal places is because the second and third columns refer to x and y coordinates..

So all in all the objective is to find the averages of the other two columns in each of the inner lists grouped by the first column in each of the inner list with as many decimal places as possible

2 Answers2

3

Assuming you meant to use the following list:

In [4]: my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1], ['B', 10, 7]]

The simply use a defaultdict to group by the first element, then find the mean:

In [6]: from collections import defaultdict

In [7]: grouper = defaultdict(list)

In [8]: for k, *tail in my_list:
    ...:     grouper[k].append(tail)
    ...:

In [9]: grouper
Out[9]:
defaultdict(list,
            {'A': [[6, 7], [4, 8]], 'B': [[9, 3], [10, 7]], 'C': [[1, 1]]})

In [10]: import statistics

In [11]: {k: list(map(statistics.mean, zip(*v))) for k,v in grouper.items()}
Out[11]: {'A': [5, 7.5], 'B': [9.5, 5], 'C': [1, 1]}

Note, if you are on Python 2, no need to call list after map. Also, you should use iteritems instead of items.

Also, you will have to do something like:

for sub in my_list:
    grouper[sub[0]].append(sub[1:])

Instead of the cleaner version on Python 3.

Finally, there is no statistics module in Python 2. So just do:

def mean(seq):
    return float(sum(seq))/len(seq)

and use that mean instead of statistics.mean

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Yes I edited it now so I've changed the bracket discrepancies; I'll try yours out and let you know. –  Aug 24 '17 at 00:05
  • `File "", line 5 for k, *tail in my_list: ^ SyntaxError: invalid syntax` –  Aug 24 '17 at 00:09
  • @Abhishek added a Python 2 compatible version. – juanpa.arrivillaga Aug 24 '17 at 00:10
  • Now it says `No module named statistics`. I better update Python.. I think that module has been what people say 'depricated' –  Aug 24 '17 at 00:15
  • @Abhishek No, `statistics` is brand, spanking new. It is Python 2 that is *deprecated*. If you have to be on Python 2 for some good reason (i.e. your boss is making you, you have Python 2 code base to maintain, etc) then fine, otherwise, you should use Python 3. – juanpa.arrivillaga Aug 24 '17 at 00:19
  • I hope you see great success in life. Don't know what I would do without you. I'm a student trying to implement K-means clustering for the first time –  Aug 24 '17 at 00:34
0

Similarly using itertools.groupby

import operator as op 
import itertools as it
import statistics as stats


iterables = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1], ['B', 10, 7]]
groups = it.groupby(sorted(iterables), op.itemgetter(0))
{k: list(map(stats.mean, zip(*[i[1:] for i in g]))) for k, g in groups}
# {'A': [5, 7.5], 'B': [9.5, 5], 'C': [1, 1]}
pylang
  • 40,867
  • 14
  • 129
  • 121