How to efficiently get the mean of the elements in two list of lists in Python

Question

I have two lists as follows.

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

I want to get the mean of the common elements in the two lists as follows.

myoutput = [["chocolate", 0.5], ["egg", 0.45]]

My current code is as follows

for item1 in mylist1:
    for item2 in mylist2:
        if item1[0] == item2[0]:
             print(np.mean([item1[1], item2[1]]))

However, since there are two for loops (O(n^2) complexity) this is very inefficient for very long lists. I am wondering if there is more standard/efficient way of doing this in Python.

Converting the items to a dictionary would make for a readable and Pythonic solution — moo, May 10 '20 at 13:18
Perhaps you can get the means of each of the list and the just do something like `mean1/len(mylist1) + mean2/len(mylist2)` as this would get you the mean of the lists combined — dodekja, May 11 '20 at 08:37
Is it a given that each key is always *unique* to each list? I think many of the answers make this assumption. It's probably a valid assumption, but just wanted to be sure in case this was just dummy data, and the "real" data might have duplicates (it's not entirely clear to me what these key,value pairs represent). — Roberto, May 11 '20 at 12:53
(mydict1[i] + mydict2[i])/2 for i in set(mydict1) & set(mydict2) # after dict() — benxyzzy, May 11 '20 at 15:07

Adam.Er8 · Accepted Answer · 2020-05-10T13:25:46.720

You can do it in O(n) (single pass over each list) by converting 1 to a dict, then per item in the 2nd list access that dict (in O(1)), like this:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

l1_as_dict = dict(mylist1)

myoutput = []
for item,price2 in mylist2:
    if item in l1_as_dict:
        price1 = l1_as_dict[item]
        myoutput.append([item, (price1+price2)/2])

print(myoutput)

Output:

[['chocolate', 0.5], ['egg', 0.45]]

AChampion · Answer 2 · 2020-05-10T13:32:26.263

An O(n) solution that will average all items.
Construct a dictionary with a list of the values and then average that dictionary afterwards:

In []:
d = {}
for lst in (mylist1, mylist2):
    for i, v in lst:
        d.setdefault(i, []).append(v)   # alternative use collections.defaultdict

[(k, sum(v)/len(v)) for k, v in d.items()]

Out[]:
[('lemon', 0.1), ('egg', 0.45), ('muffin', 0.3), ('chocolate', 0.5), ('milk', 0.2), ('carrot', 0.8)]

Then if you just want the common ones you can add a guard:

In []:
[(k, sum(v)/len(v)) for k, v in d.items() if len(v) > 1]

Out[]:
[('egg', 0.45), ('chocolate', 0.5)]

This extends to any number of lists and makes no assumption around the number of common elements.

RoadRunner · Answer 3 · 2020-05-10T14:23:52.413

Here is one solution that uses collections.defaultdict to group the items and calculates the averages with statistics.mean:

from collections import defaultdict
from statistics import mean

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

d = defaultdict(list)
for lst in (mylist1, mylist2):
    for k, v in lst:
        d[k].append(v)

result = [[k, mean(v)] for k, v in d.items()]

print(result)
# [['lemon', 0.1], ['egg', 0.45], ['muffin', 0.3], ['chocolate', 0.5], ['milk', 0.2], ['carrot', 0.8]]

If we only want common keys, just check if the values are more than 1:

result = [[k, mean(v)] for k, v in d.items() if len(v) > 1]

print(result)
# [['egg', 0.45], ['chocolate', 0.5]]

We could also just build the result from set intersection:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

d1, d2 = dict(mylist1), dict(mylist2)

result = [[k, (d1[k] + d2[k]) / 2] for k in d1.keys() & d2.keys()]

print(result)
# [['egg', 0.45], ['chocolate', 0.5]]

score 8 · Answer 4 · edited May 06 '21 at 16:25

You can use the Pandas library to avoid writing any sort of loops yourself.

Your code would be really concise and clean.

Install Pandas like: pip install pandas.

Then try this:

In [132]: import pandas as pd

In [109]: df1 = pd.DataFrame(mylist1)

In [110]: df2 = pd.DataFrame(mylist2)

In [117]: res = pd.merge(df1, df2, on=0)

In [121]: res['mean'] = res.mean(axis=1)

In [125]: res.drop(['1_x', '1_y'], 1, inplace=True)

In [131]: res.values.tolist()
Out[131]: [['egg', 0.45], ['chocolate', 0.5]]

Edit

Pandas is crazy fast because it uses numpy under the hood. Numpy implements highly efficient array operations.

Please check the post : Why is Pandas so madly fast? for more details on calculating mean through pure Python vs Pandas.

score 7 · Answer 5 · edited May 11 '20 at 13:33

To easily manipulate your values, I'd suggest using a dict, find the common keys, and compute the mean:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

recipe_1 = dict(mylist1)  # {'lemon': 0.1, 'egg': 0.1, 'muffin': 0.3, 'chocolate': 0.5}
recipe_2 = dict(mylist2)  # {'chocolate': 0.5, 'milk': 0.2, 'carrot': 0.8, 'egg': 0.8}

common_keys = recipe_1.keys() & recipe_2.keys()  # {'chocolate', 'egg'}

myoutput = [[item, np.mean((recipe_1[item], recipe_2[item]))] for item in common_keys]
myoutput = [[item, (recipe_1[item] + recipe_2[item]) / 2] for item in common_keys]

score 6 · Answer 6 · answered May 10 '20 at 13:28

6

Convert lists to dicts

d_list1 = dict(mylist1)
d_list2 = dict(mylist2)

[[k, (v+d_list2[k])/2] for k, v in d_list1.items() if k in d_list2]
#[['egg', 0.45], ['chocolate', 0.5]]

answered May 10 '20 at 13:28

Transhuman

3,527
1
9
15

score 6 · Answer 7 · answered May 10 '20 at 13:34

You get the common keys from the two lists by using the set intersection method and then using a list comprehension calculate the mean:

mylist1 = [["lemon", 0.1], ["egg", 0.1], ["muffin", 0.3], ["chocolate", 0.5]]
mylist2 = [["chocolate", 0.5], ["milk", 0.2], ["carrot", 0.8], ["egg", 0.8]]

dict1 = dict(mylist1)
dict2 = dict(mylist2)
res = [[key, (dict1.get(key)+dict2.get(key))/2] for key in set(dict1.keys()).intersection(set(dict2.keys()))]
print(res)

Output:

>> [['chocolate', 0.5], ['egg', 0.45]]

score 5 · Answer 8 · answered May 10 '20 at 13:34

5

You can do it in the time required for comuting set intersections which is apparently O(min(N1,N2)) where N1, N2 are the list lengths.

intersect = set([a[0] for a in mylist1]).intersection([a[0] for a in mylist2])
d1=dict(mylist1)
d2=dict(mylist2)
{i:(d1[i]+d2[i])/2 for i in intersect}

answered May 10 '20 at 13:34

jeremy_rutman

3,552
4
28
47

1

I would be careful in how you describe the complexity. You have 2 list comprehension and 3 dictionary constructions. The set intersection is only one of your operations. Having said all that it is still only `O(n)`. – AChampion May 11 '20 at 04:32
*"Comuting"* is not an English word. Do you mean *[computing](https://en.wiktionary.org/wiki/computing#Verb)* or *[commuting](https://en.wiktionary.org/wiki/commute#Verb)* (*"2. (intransitive, mathematics) Of an operation, to be commutative, i.e. to have the property that changing the order of the operands does not change the result."*)? – Peter Mortensen May 11 '20 at 13:38
I don't think commuting the set intersections will make any difference, as the intersection operation already takes into account the shorter length list and does no more operations than required, so it wouldn't matter if you did a.intersection.b or b.intersection.a . However , this answer did commute [reduce to lesser length] the "time served" from O(n1*n2) to O(min(n1,n2)) so there's that to consider on the long drive to work. – jeremy_rutman May 11 '20 at 16:18

score 2 · Answer 9 · edited May 12 '20 at 00:07

2

Here is a simple, very Pythonic solution:

result = [[x[0], (x[1] + y[1])/2] for x in mylist1 for y in mylist2 if x[0] == y[0]]

It probably is not the fastest solution, but it is faster by virtue of using Python list comprehension to iterate the lists and, since neither this solution nor the OP's will work with multiple instances of a list key value, it replaces the np.mean with a simple average of the two values.

edited May 12 '20 at 00:07

Peter Mortensen

30,738
21
105
131

answered May 11 '20 at 18:39

Mark

29
1

This is still O(n^2). – Solomon Ucko May 12 '20 at 23:34

How to efficiently get the mean of the elements in two list of lists in Python

9 Answers9