4

I'm using NumPy to find intersections on a graph, but isClose returns multiple values per intersection

So, I'm going to try to find their averages. But first, I want to isolate the similar values. This is also a useful skill I feel.

I have a list of the x values for the intersection called idx that looks like this

[-8.67735471 -8.63727455 -8.59719439 -5.5511022  -5.51102204 -5.47094188
 -5.43086172 -2.4248497  -2.38476954 -2.34468938 -2.30460922  0.74148297
  0.78156313  0.82164329  3.86773547  3.90781563  3.94789579  3.98797595
  7.03406814  7.0741483   7.11422846]

and I want to separate it out into lists each comprised of the similar numbers.

this is what I have so far:

n = 0
for i in range(len(idx)):
    try:
        if (idx[n]-idx[n-1])<0.5:
            sdx.append(idx[n-1])
        else:
            print(sdx)
            sdx = []
    except:
        sdx.append(idx[n-1])
    n = n+1

It works for the most part but it forgets some numbers:

[-8.6773547094188377, -8.6372745490981959]
[-5.5511022044088181, -5.5110220440881763, -5.4709418837675354]
[-2.4248496993987976, -2.3847695390781567, -2.3446893787575149]
[0.7414829659318638, 0.78156312625250379]
[3.8677354709418825, 3.9078156312625243, 3.9478957915831661]

Theres probably a more efficient way to do this, does anyone know of one?

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
user3151828
  • 363
  • 1
  • 7
  • 21

2 Answers2

7

Considering you have a numpy array, you can use np.split, splitting where the difference is > .5:

import numpy as np
x = np.array([-8.67735471, -8.63727455, -8.59719439, -5.5511022, -5.51102204, -5.47094188,
     -5.43086172, -2.4248497, -2.38476954, -2.34468938, -2.30460922, 0.74148297,
     0.78156313, 0.82164329, 3.86773547, 3.90781563, 3.94789579, 3.98797595,
     7.03406814, 7.0741483])


print np.split(x, np.where(np.diff(x) > .5)[0] + 1)

[array([-8.67735471, -8.63727455, -8.59719439]), array([-5.5511022 , -5.51102204, -5.47094188, -5.43086172]), array([-2.4248497 , -2.38476954, -2.34468938, -2.30460922]), array([ 0.74148297,  0.78156313,  0.82164329]), array([ 3.86773547,  3.90781563,  3.94789579,  3.98797595]), array([ 7.03406814,  7.0741483 ])]

np.where(np.diff(x) > .5)[0] returns the index where the following element does not meet the np.diff(x) > .5) condition:

In [6]: np.where(np.diff(x) > .5)[0]
Out[6]: array([ 2,  6, 10, 13, 17])

+ 1 adds 1 to each index:

In [12]: np.where(np.diff(x) > .5)[0] + 1
Out[12]: array([ 3,  7, 11, 14, 18])

Then passing [ 3, 7, 11, 14, 18] to np.split splits the elements into subarrays, x[:3], x[3:7],x[7:11] ...

Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0

If your final destination is finding average values of each cluster/group, where each cluster would be marked by little difference that don't cross a certain threshold, you can use the approach listed next.

Basically, we convert the input list to a numpy array, sort it and then find consecutive differences. Based on the differences when compared against a certain threshold, we create a ID array with same IDs for elements from the same group. Finally, using those IDs, we do binning and averaging within the bins with np.bincount, essentially getting the average of each group.

Here's the implementation -

import numpy as np

# Input list
AList = [-8.67735471, -8.63727455, -8.59719439, -5.5511022,  -5.51102204,
         -5.47094188, -5.43086172, -2.4248497,  -2.38476954, -2.34468938,
         -2.30460922,  0.74148297,  0.78156313,  0.82164329,  3.86773547,
    3.90781563, 3.94789579,  3.98797595,  7.03406814,  7.0741483, 7.11422846]

# Tolerance as thresholding parameter to distinguish between two "groups"
tolerance = 1

# Convert to a numpy array and sort if not already sorted
A = np.sort(np.asarray(AList))

# ID array that has the same IDs for elements of the same group
ID_array = (np.append([False],np.diff(A)>tolerance)).cumsum()

# Finally get the average values for each group    
average_values = np.bincount(ID_array,A)/np.bincount(ID_array)

Sample run -

In [301]: A
Out[301]: 
array([-8.67735471, -8.63727455, -8.59719439, -5.5511022 , -5.51102204,
       -5.47094188, -5.43086172, -2.4248497 , -2.38476954, -2.34468938,
       -2.30460922,  0.74148297,  0.78156313,  0.82164329,  3.86773547,
        3.90781563,  3.94789579,  3.98797595,  7.03406814,  7.0741483 ,
        7.11422846])

In [302]: average_values
Out[302]: 
array([-8.63727455, -5.49098196, -2.36472946,  0.78156313,  3.92785571,
        7.0741483 ])
Divakar
  • 218,885
  • 19
  • 262
  • 358