0

I am producing an average value from the difference of 3 values and want to place it in a list

A sample of the list I want to average looks like this:

...
[6.0, 270.0, -55.845848680633168],
[6.0, 315.0, -47.572000492889323],
[6.5, 0.0, -47.806802767243724],
[6.5, 45.0, -48.511643275159528],
[6.5, 90.0, -45.002053150122123],
[6.5, 135.0, -51.034656702050455],
[6.5, 180.0, -53.266356523649002],
[6.5, 225.0, -47.872632929518339],
[6.5, 270.0, -52.09662072002746],
[6.5, 315.0, -48.563996448937075]]

There will be up to 3 rows where the first 2 columns match (these 2 columns are polar coordinates) and when this is the case I want to take the difference between the 3rd elements, average it and append the polar coordinates of the point and the averaged result into a new list

for a in avg_data:
    comparison = []
    for b in avg_data:
        if a[0] == b[0] and a[1] == b[1]:
            comparison.append(b[2])

    print comparison    
    z = 0   # reset z to 0, z does not need set now in if len(comp) == 1

    if len(comparison) == 2: # if there are only 2 elements, compare them
        z += -(comparison[0]) + comparison[1]
    if len(comparison) == 3: # if all 3 elements are there, compare all 3
        z += -(comparison[0]) + comparison[1]
        z += -(comparison[0]) + comparison[2]
        z += -(comparison[1]) + comparison[2]
        z = z/3 #average the variation

    avg_variation.append([a[0], a[1], z]) #append the polar coordinates and the averaged variation to a list

This code outputs the correct data to the list except it outputs it every time it comes across matching polar coordinates so I end up with duplicate rows.

To stop this I have tried implementing an if statement to look for matching polar coordinates in the avg_variation list before performing the averaging again

if a[0] not in avg_variation and a[1] not in avg_variation:

This does not work and I get the error

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I don't think any or all are what I am looking for as I only want to check the first two columns and not the third column against the already appended values. Anyone any idea how I can make my if statement better?

To clear up a bit more what my actual question is:

My code searches through nested lists for lists where the 1st 2 elements match, performs a calculation on the 3rd elements and then appends them to a new list. My problem is that if there are 2 or 3 rows where the 1st 2 elements match up it appends the result to the new list 2 or 3 times, I want it to only do it once

Edit: Sorry my original question was misleadng as to the purpose of my code.

mark mcmurray
  • 1,581
  • 4
  • 18
  • 28
  • So, just to get it straight, you resulting lists should all have the same value in terms of their third number (which should be the average of your third column above)? – arshajii Apr 20 '13 at 17:36
  • The resulting list should have nested lists whose 1st and second elements are the polar coordinates and the 3rd element is the average of the values at these polar coordinates. in the resulting list each row should have unique polar coordinates. – mark mcmurray Apr 20 '13 at 17:49
  • @markmcmurray: your code doesn't compute the average of the values, though, it computes the average (signed) difference among the elements, which is different. – DSM Apr 20 '13 at 17:52
  • Yes you're right, my bad. I forgot I am doing that too as I am also averaging the values but I will ammend my question to mention this. – mark mcmurray Apr 20 '13 at 17:56
  • Edited to reflect the code more accurately. – mark mcmurray Apr 20 '13 at 17:58

3 Answers3

3

IIUC, I think a simpler approach would be something like

import numpy as np
from itertools import combinations
from collections import defaultdict

def average_difference(seq):
    return np.mean([j-i for i,j in combinations(seq, 2)]) if len(seq) > 1 else 0

def average_over_xy(seq, fn_to_apply):
    d = defaultdict(list)
    for x,y,z in seq:
        d[x,y].append(z)

    outlist = [[x,y,fn_to_apply(z)] for (x,y),z in sorted(d.items())]
    return outlist

which loops over all the rows, makes a dictionary where the x,y coordinates are the keys and the values lists of elements, and then turns that dictionary into a sorted list of lists, applying the specified function among the elements in z. For example, we could use the average signed and ordered difference, like in your code:

which produces

>>> seq = [[1, 2, 30], [1, 2, 40], [1, 2, 50], [1, 3, 4], [1, 3, 6], [2, 10, 5]] 
>>> average_over_xy(seq, average_difference)
[[1, 2, 13.333333333333334], [1, 3, 2.0], [2, 10, 0]]

Note that the way you've defined it, which I've matched above, the answer depends upon the order that the elements are given in, i.e.

>>> average_over_xy([[1,2,3],[1,2,4]], average_difference)
[[1, 2, 1.0]]
>>> average_over_xy([[1,2,4],[1,2,3]], average_difference)
[[1, 2, -1.0]]

If you wanted to, you could use

def average_difference_sorted(seq):
    return average_difference(sorted(seq))

instead or use a standard deviation or whatever you like. (You didn't mention your use case, so I'll assume that you've got the list in the order you want, you're aware of the pitfalls, and you really need average_difference).

There are some faster numpy-based tricks we could do, and ways to generalize it, but using a defaultdict to accumulate values is a handy pattern, and it's often fast enough.

DSM
  • 342,061
  • 65
  • 592
  • 494
1

Here is a possible solution:

l=[[6.0, 270.0, -55.845848680633168],
[6.0, 315.0, -47.572000492889323],
[6.5, 0.0, -47.806802767243724],
[6.0, 180.0, -53.266356523649002],
[6.0, 225.0, -47.872632929518339],
[6.0, 270.0, -52.09662072002746],
[6.0, 315.0, -48.563996448937075]]

# First, we change the structure so that the pair of coordinates
# becomes a tuple which can be used as dictionary key
l=[[(c1, c2), val] for c1, c2, val in l]

# We build a dictionary coord:[...list of values...]
d={}
for coord, val in l:
    d.setdefault(coord,[]).append(val)

# Here, I compute the mean of each list of values.
# Apply your own function !

means = [[coord[0], coord[1], sum(vals)/len(vals)] for coord, vals in d.items()]

print means
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
0

You haven't given all of the information necessary to be sure of this, but I believe your error is caused by performing logical operations on numpy arrays. See this answer to a question with a similar error.

Without more information, it's difficult to duplicate the context of your question to try it, but perhaps being more specific in the boolean operations in the if statement will help.

Community
  • 1
  • 1
marr75
  • 5,666
  • 1
  • 27
  • 41