1

I'm looking for an efficient way to extract from an array in Python only significant values, for instance, only those 10 times bigger than the rest. The logic (no code) using a very simple case is something like that:

array =  [5000, 400, 40, 10, 1, 35] # here the significant value will be 5000. 

from i=0 to len.array # to run the procedure in all the array components

    delta = array[i] / array [i+1] # to confirm that array[i] is significant or not. 

    if delta >= 10 : # assuming a rule of 10X significance i.e significance = 10 times bigger than the rest of elements in the array.

    new_array = array[i] # Insert to new_array the significant value

    elif delta <= 0.1 : # in this case the second element is the significant.

    new_array = array[i+1] # Insert to new_array the significant value

at the end new_array will be composed by the significant values, in this case new_array =[5000], but must apply to any kind of array.

Thanks for your help!

UPDATE!!!

Thanks to all for your answers!!! in particular to Copperfield who gave me a good idea about how to do it. Here is the code that's working for the purpose!

array_o = [5000,4500,400, 4, 1, 30, 2000]

array = sorted(array_o)

new_array = []

max_array = max(array)

new_array.append(max_array)

array.remove(max_array)

    for i in range(0,len(array)):
         delta = max_array / array[i]
         if delta <= 10:
              new_array.append(array[i])
FMEZA
  • 51
  • 1
  • 9
  • 1
    Does it mean that `new_array` will always contain only one value? because in your case `400` should also be added to `new_array` (from what I understand). – ettanany Dec 26 '16 at 16:39
  • Initialize `new_array` as a blank array first and then `append()` any new item that satisfies the `delta` you are looking for. –  Dec 26 '16 at 16:39
  • 3
    what would you extract from `[5001, 5000, 400, 40, 10, 1, 35]`? – hiro protagonist Dec 26 '16 at 16:43
  • so you want to extract the [maximum](https://docs.python.org/3/library/functions.html#max) element as long as it is 10x bigger than the second biggest element?, well, [sorted](https://docs.python.org/3/library/functions.html#sorted) it and check the last 2 elements – Copperfield Dec 26 '16 at 16:55
  • If you are looking for outliers, then [this](http://stackoverflow.com/questions/22354094/pythonic-way-of-detecting-outliers-in-one-dimensional-observation-data) might answer your question. – trincot Dec 26 '16 at 17:01
  • here is the thing, the code must apply to any kind of array, that's exactly the point where I get stuck. in the example or hiro's example there are one or two significant values. Thanks to all! – FMEZA Dec 26 '16 at 17:29
  • 1
    look to me that the problem here is that you don't have a clear idea of what you want, because your pseudo code say something while you describe something different – Copperfield Dec 26 '16 at 17:57
  • pseudo code is a particular example of what I'm looking, but Copperfield, your previous answer gave a very good idea about what can be done. Thasnk for that. – FMEZA Dec 26 '16 at 19:35

3 Answers3

0

Does this answer your question?

maxNum = max(array)
array.remove(maxNum)
SecMaxNum = max(array)

if maxNum / SecMaxNum >= 10 :
    # take action accordingly 
else:
    # take action accordingly 
Amjad
  • 3,110
  • 2
  • 20
  • 19
  • Something like that, but the thing is that this must apply for all kind of arrays, for that reason I tough that working with indexes will be much better. In your example, what could happen If there are 5 significant values? Thanks for your help! – FMEZA Dec 26 '16 at 17:25
  • I this case I would go for sorting the list (or array) `arr.sort(reverse=True)` then I will specify a sub-list of interest by comparing the elements to the first element for further analyses. – Amjad Dec 26 '16 at 17:53
0

your pseudo code can be translate to this function

def function(array):
    new_array = []
    for i in range(1,len(array)):
        delta = array[i-1] / array[i]
        if delta >= 10:
            new_array.append( array[i-1] )
        elif delta <= 0.1:
            new_array.append(  array[i] )
    return new_array

this give this result

>>> function([5000, 400, 40, 10, 1, 35])
[5000, 400, 10, 35]
>>> 

Now, what you describe can be done like this in python 3.5+

*rest, secondMax, maxNum = sorted(array)
if maxNum / secondMax >= 10:
    # take action accordingly 
else:
    # take action accordingly 

or in previous versions

sortedArray = sorted(array)
if sortedArray[-1] / sortedArray[-2] >= 10:
    # take action accordingly 
else:
    # take action accordingly     

(the negative index access the element from last to first, so -1 is the last one, -2 the second last, etc )

Copperfield
  • 8,131
  • 3
  • 23
  • 29
0

I would not adopt the approach of only comparing each value to the one next to it. If the array is unsorted then obviously that's a disaster, but even if it is sorted:

a = [531441, 59049, 6561, 729, 81, 9, 9, 8, 6, 6, 5, 4, 4, 4, 3, 3, 1, 1, 1, 1]

In that example, the "rest" (i.e. majority) of the values are <10, but I've managed to get up into the 6-digit range very quickly with each number only being 9 times the one next to it (so, your rule would not be triggered).

One approach to outlier detection is to subtract the median from your distribution and divide by a non-parametric statistic that reflects the spread of the distribution (below, I've chosen a denominator that would be equivalent to the standard deviation if the numbers were normally distributed). That gives you an "atypicality" score on a standardized scale. Find the large values, and you have found your outliers (any score larger than, say, 3—but you may need to play around a bit to find the cutoff that works nicely for your problem).

  import numpy
  npstd = numpy.diff(numpy.percentile(a, [16, 84]))/2.0   # non-parametric "standard deviation" equivalent
  score = (a - numpy.median(a)) / npstd
  outlier_locations, = numpy.where(score > 3)  # 3, 4 or 5 might work well as cut-offs
jez
  • 14,867
  • 5
  • 37
  • 64