2

I have a set of data that has a circular scale (angles from 0 to 360°). I know most of the values in the dataset are close to each other, but some are outliers. I want to determine which of them have to be eliminated.

The problem with circular scale is the following (using an example): data = [350, 0, 10] is an array containing angles in degrees. The absolute mean of this array is 123.33. But considering their units, the mean value of 350°, 0° and 10° is 0°.

We see here that on the mean value there is a problem. The problem also exists when computing the standard deviation.

How do I do it?

jeandemeusy
  • 226
  • 3
  • 12
  • What are *outliars*? – Daweo Oct 18 '21 at 09:51
  • Does this answer your question? [Easy way to keeping angles between -179 and 180 degrees](https://stackoverflow.com/questions/2320986/easy-way-to-keeping-angles-between-179-and-180-degrees) – scandav Oct 18 '21 at 09:53
  • Take the sign or the cosign of the angle, and you'll have a value whos range is between -1 and 1 - but crucially, because it's periodic, angles of 355 will have a value close to angles of 5. Using sin or cos should also work for those cases where you want to use negative angles. – Thomas Kimber Oct 18 '21 at 09:59
  • Tricky question if I remember correctly. How do you define mean? I.e, does the mean of 0°, 0° and 90° is 30° or 26.5° (arctan(1/2))? How do you define standard deviation? – hpchavaz Oct 18 '21 at 10:12
  • That's my question. What is and how to define standard deviation when data is circular – jeandemeusy Oct 18 '21 at 11:20
  • 1
    Why not just use the (corrected) sample standard deviation, using the absolute difference between the angles (see the function `absDiff_angle` in my answer below)? – coproc Oct 18 '21 at 13:03

3 Answers3

1

So you are given a list of angles and want to find the "mean" (average) angle and outliers. One simple possibility is to average the 2D vectors (cos(a),sin(a)) corresponding to the angles and compute the std deviation on the angles again:

from math import degrees, radians, sin, cos, atan2

def absDiff_angle(a1, a2, fullAngle=360):
    a1,a2 = a1%fullAngle,a2%fullAngle
    if a1 >= a2: a1,a2 = a2,a1
    return min(a2-a1, a1+fullAngle-a2)

# sample input of angles 350,351,...359,0,...,10, 90
angles_deg = list(range(350,360)) + list(range(11)) + [90]

# compute corresponding 2D vectors
angles_rad = [radians(a) for a in angles_deg]
xVals = [cos(a) for a in angles_rad]
yVals = [sin(a) for a in angles_rad]

# average of 2D vectors
N = len(angles_rad)
xMean = sum(xVals)/N
yMean = sum(yVals)/N

# go back to angle
angleMean_rad = atan2(yMean,xMean)
angleMean_deg = degrees(angleMean_rad)

# filter outliers
square = lambda v: v*v
stddev = sqrt(sum([square(absDiff_angle(a, angleMean_deg)) for a in angles_deg])/(N-1))
MIN_DIST_OUTLIER = 3*stddev
isOutlier = lambda a: absDiff_angle(a, angleMean_deg) >= MIN_DIST_OUTLIER
outliers = [a for a in angles_deg if isOutlier(a)]

print(angleMean_deg)
print(outliers)

Note, that outliers can distort the mean value and std deviation. To be less sensitive to outliers one can compute a histogram of the angles (for, e.g., the bins [0°, 10°[, [10°, 20°[, ..., [350°,360°[) and select the angles from the bin with most members and neighbours of it for computing the mean angle (and std deviation).

coproc
  • 6,027
  • 2
  • 20
  • 31
0

Circular mean

You can substitute the vectors to the corresponding points on the unit radius circle to the angles, then define the mean as the angle of the sum of the vectors.

But beware this gives a mean of 26.5° for [0°, 0°, 90°] as 26.5° = arctan(1/2) and there is no mean for [0°, 180°].

Outliers

Outliers are the angles the farther from the mean, which is the greater absolute value of the difference of angles.

Standard deviation

The standard deviation can be use to define outliers.

@coproc gives the corresponding code in its answer.

Interquartiles value

The interquartiles value can also be used, it is less dependable on outliers values than the standard deviation but in the circular case it could be irrelevant.

Anyway :

from functools import reduce
from math import degrees, radians, sin, cos, atan2, pi


def norm_angle(angle, degree_unit = True):
    """ Normalize an angle return in a value between ]180, 180] or ]pi, pi]."""
    mpi = 180 if degree_unit else pi
    angle = angle % (2 * mpi)
    return angle if abs(angle) <= mpi else angle - (1 if angle >= 0 else -1) * 2 * mpi


def circular_mean(angles, degree_unit = True):
    """ Returns the circular mean from a collection of angles. """
    angles = [radians(a) for a in angles] if degree_unit else angles
    x_sum, y_sum = reduce(lambda tup, ang: (tup[0]+cos(ang), tup[1]+sin(ang)), angles, (0,0))
    if x_sum == 0 and y_sum == 0: return None
    return (degrees if degree_unit else lambda x:x)(atan2(y_sum, x_sum)) 


def circular_interquartiles_value(angles, degree_unit = True):
    """ Returns the circular interquartiles value from a collection of angles."""
    mean = circular_mean(angles, degree_unit=degree_unit)
    deltas = tuple(sorted([norm_angle(a - mean, degree_unit=degree_unit) for a in angles]))

    nb = len(deltas)
    nq1, nq3, direct = nb // 4, nb - nb // 4, (nb % 4) // 2

    q1 = deltas[nq1] if direct else (deltas[nq1-1] + deltas[nq1]) / 2
    q3 = deltas[nq3-1] if direct else(deltas[nq3-1] + deltas[nq3]) / 2

    return q3-q1


def circular_outliers(angles, coef = 1.5, values=True, degree_unit=True):
    """ Returns outliers from a collection of angles. """
    mean = circular_mean(angles, degree_unit=degree_unit)
    maxdelta = coef * circular_interquartiles_value(angles, degree_unit=degree_unit)
    deltas = [norm_angle(a - mean, degree_unit=degree_unit) for a in angles]

    return [z[0] if values else i for i, z in enumerate(zip(angles, deltas)) if abs(z[1]) > maxdelta]

Lets give it a try:

angles = [-179, -20, 350, 720, 10, 20, 179] # identical to [-179, -20, -10, 0, 10, 20, 179]
circular_mean(angles), circular_interquartiles_value(angles), circular_outliers(angles)

output:

(-1.1650923760388311e-14, 40.000000000000014, [-179, 179])

As we might expect:

  • the circular_mean is near 0 as the list is symetric for the 0° axis;
  • the circular_interquartiles_value is 40° as the first quartile is -20° and the third quartile is 20°;
  • the outliers are correctly detected, 350 and 720 been taken for their normalized values.
hpchavaz
  • 1,368
  • 10
  • 16
  • Computing the mean as the angle of the sum of the vectors is great. The problem with the sum equaling 0 is easy to handle. But the standard deviation is the real problem to solve. Maybe using mean and std to define outliers is not the right approach.. maybe – jeandemeusy Oct 18 '21 at 11:02
  • @jeandemeusy , I add outliers detection based on interquartiles value in the answer. – hpchavaz Oct 18 '21 at 18:31
  • Great answer thank you ! – jeandemeusy Oct 18 '21 at 20:24
  • @jeandemeusy, warning : the code is not tested at all. There is room for optimization as the mean and deltas are calculated twice. You can 'like' it. – hpchavaz Oct 18 '21 at 22:02
0

If you immediately convert the angle data (0..360) using either Sine or Cosine functions you transform the data into the range -1.0, 1.0.

In doing so you lose the information related to the quadrant that the angle was found in so you need to extract that information.

quadrant = [n // 90 for n in data] # values: 0, 1, 2, 3

You can fold the quadrants into one and the Sine or Cosine transform of the result will be in the range 0.0, 1.0.

single_quadrant = [n % 90 for n in data] # values: 0, 1, ..., 89

Using both of these two ideas it's possible to map data to the range 0.0 - 4.0 using either of the Sine or Cosine functions like so:

import math

using_sine = [(n//90 + math.sin(math.radians(n % 90))) for n in data]

using_cosine = [(n//90 + math.cos(math.radians(n % 90))) for n in data]
Dan Nagle
  • 4,384
  • 1
  • 16
  • 28