Extracting significant values from an array

Question

I'm looking for an efficient way to extract from an array in Python only significant values, for instance, only those 10 times bigger than the rest. The logic (no code) using a very simple case is something like that:

array =  [5000, 400, 40, 10, 1, 35] # here the significant value will be 5000. 

from i=0 to len.array # to run the procedure in all the array components

    delta = array[i] / array [i+1] # to confirm that array[i] is significant or not. 

    if delta >= 10 : # assuming a rule of 10X significance i.e significance = 10 times bigger than the rest of elements in the array.

    new_array = array[i] # Insert to new_array the significant value

    elif delta <= 0.1 : # in this case the second element is the significant.

    new_array = array[i+1] # Insert to new_array the significant value

at the end new_array will be composed by the significant values, in this case new_array =[5000], but must apply to any kind of array.

Thanks for your help!

UPDATE!!!

Thanks to all for your answers!!! in particular to Copperfield who gave me a good idea about how to do it. Here is the code that's working for the purpose!

array_o = [5000,4500,400, 4, 1, 30, 2000]

array = sorted(array_o)

new_array = []

max_array = max(array)

new_array.append(max_array)

array.remove(max_array)

    for i in range(0,len(array)):
         delta = max_array / array[i]
         if delta <= 10:
              new_array.append(array[i])

Does it mean that `new_array` will always contain only one value? because in your case `400` should also be added to `new_array` (from what I understand). — ettanany, Dec 26 '16 at 16:39
Initialize `new_array` as a blank array first and then `append()` any new item that satisfies the `delta` you are looking for. — , Dec 26 '16 at 16:39
what would you extract from `[5001, 5000, 400, 40, 10, 1, 35]`? — hiro protagonist, Dec 26 '16 at 16:43
so you want to extract the [maximum](https://docs.python.org/3/library/functions.html#max) element as long as it is 10x bigger than the second biggest element?, well, [sorted](https://docs.python.org/3/library/functions.html#sorted) it and check the last 2 elements — Copperfield, Dec 26 '16 at 16:55
If you are looking for outliers, then [this](http://stackoverflow.com/questions/22354094/pythonic-way-of-detecting-outliers-in-one-dimensional-observation-data) might answer your question. — trincot, Dec 26 '16 at 17:01
here is the thing, the code must apply to any kind of array, that's exactly the point where I get stuck. in the example or hiro's example there are one or two significant values. Thanks to all! — FMEZA, Dec 26 '16 at 17:29
look to me that the problem here is that you don't have a clear idea of what you want, because your pseudo code say something while you describe something different — Copperfield, Dec 26 '16 at 17:57
pseudo code is a particular example of what I'm looking, but Copperfield, your previous answer gave a very good idea about what can be done. Thasnk for that. — FMEZA, Dec 26 '16 at 19:35

score 0 · Answer 1 · answered Dec 26 '16 at 17:10

0

Does this answer your question?

maxNum = max(array)
array.remove(maxNum)
SecMaxNum = max(array)

if maxNum / SecMaxNum >= 10 :
    # take action accordingly 
else:
    # take action accordingly

answered Dec 26 '16 at 17:10

Amjad

3,110
2
20
19

Something like that, but the thing is that this must apply for all kind of arrays, for that reason I tough that working with indexes will be much better. In your example, what could happen If there are 5 significant values? Thanks for your help! – FMEZA Dec 26 '16 at 17:25
I this case I would go for sorting the list (or array) `arr.sort(reverse=True)` then I will specify a sub-list of interest by comparing the elements to the first element for further analyses. – Amjad Dec 26 '16 at 17:53

Copperfield · Answer 2 · 2016-12-26T17:51:58.037

your pseudo code can be translate to this function

def function(array):
    new_array = []
    for i in range(1,len(array)):
        delta = array[i-1] / array[i]
        if delta >= 10:
            new_array.append( array[i-1] )
        elif delta <= 0.1:
            new_array.append(  array[i] )
    return new_array

this give this result

>>> function([5000, 400, 40, 10, 1, 35])
[5000, 400, 10, 35]
>>>

Now, what you describe can be done like this in python 3.5+

*rest, secondMax, maxNum = sorted(array)
if maxNum / secondMax >= 10:
    # take action accordingly 
else:
    # take action accordingly

or in previous versions

sortedArray = sorted(array)
if sortedArray[-1] / sortedArray[-2] >= 10:
    # take action accordingly 
else:
    # take action accordingly

(the negative index access the element from last to first, so -1 is the last one, -2 the second last, etc )

jez · Answer 3 · 2016-12-26T18:23:39.897

I would not adopt the approach of only comparing each value to the one next to it. If the array is unsorted then obviously that's a disaster, but even if it is sorted:

a = [531441, 59049, 6561, 729, 81, 9, 9, 8, 6, 6, 5, 4, 4, 4, 3, 3, 1, 1, 1, 1]

In that example, the "rest" (i.e. majority) of the values are <10, but I've managed to get up into the 6-digit range very quickly with each number only being 9 times the one next to it (so, your rule would not be triggered).

One approach to outlier detection is to subtract the median from your distribution and divide by a non-parametric statistic that reflects the spread of the distribution (below, I've chosen a denominator that would be equivalent to the standard deviation if the numbers were normally distributed). That gives you an "atypicality" score on a standardized scale. Find the large values, and you have found your outliers (any score larger than, say, 3—but you may need to play around a bit to find the cutoff that works nicely for your problem).

  import numpy
  npstd = numpy.diff(numpy.percentile(a, [16, 84]))/2.0   # non-parametric "standard deviation" equivalent
  score = (a - numpy.median(a)) / npstd
  outlier_locations, = numpy.where(score > 3)  # 3, 4 or 5 might work well as cut-offs

Extracting significant values from an array

3 Answers3