1

I have a function which produces an array as such:

[ 14  48  81 111 112 113 114 148 179 213 247 279 311 313 314 344 345 346]

which corresponds to data values where a curve crosses the x axis. As the data is imperfect, it generates false positives, where my output array has elements all very close to each other e.g. [111 112 113 114]. I need to remove the false positives from this array but still retain the initial positive around where the false positives are showing. Basically I need my function to produce and array more like

[ 14  48  81 112 148 179 213 247 279 313 345]

where the false positives from imperfect data have been removed.

Sociopath
  • 13,068
  • 19
  • 47
  • 75
  • 9
    What have you tried so far? Also what dictates that a value is false positive, given you're example sub list `111, 112, 113, 114`, how would we know that `112` is correct and everything around it is false positive? – fixatd Mar 13 '19 at 12:22
  • 1
    Welcome to Stack Overflow. Please read this, because it will help you format questions properly -- How to create a Minimal, Complete, and Verifiable example -- https://stackoverflow.com/help/mcve – Life is complex Mar 13 '19 at 12:25
  • Should be easy as long as you have a more precise definition of "very close" or "false positive". – Stop harming Monica Mar 13 '19 at 12:27
  • If the false positives are due to noise in the data a possible approach is to apply a low pass filter to remove higher frequency noise (using the FFT). The resulting data with less noise will have fewer (or no) false positives. – user2699 Mar 13 '19 at 13:52
  • A similar approach is to apply a moving average, https://stackoverflow.com/questions/13728392/moving-average-or-running-mean – user2699 Mar 13 '19 at 13:57

3 Answers3

3

Here is a possible approach:

arr = [14, 48, 81, 111, 112, 113, 114, 148, 179, 213, 247, 279, 311, 313, 314, 344, 345, 346]

def filter_arr(arr, offset):
    filtered_nums = set()
    for num in sorted(arr):
        # Check if there are any "similar" numbers already found
        if any(num+x in filtered_nums for x in range(-offset, offset+1)):
            continue
        else:
            filtered_nums.add(num)
    return list(sorted(filtered_nums))

Then you can apply the filtering with any offset that you think makes the most sense.

filter_arr(arr, offset=5)  
Output:  [14, 48, 81, 111, 148, 179, 213, 247, 279, 311, 344]
Cihan
  • 2,267
  • 8
  • 19
0

This can do

#arr is the array you want, num is the number difference between them

def check(arr, num):
    for r in arr:
        for c in arr:
            if abs(r-c) < num + 1:
                arr.remove(c)
    return arr
yourarray = [14,48  ,81 ,111 ,112 ,113 ,114, 148 ,            179 ,213 ,247 ,279 ,311, 313 ,314 ,344, 345, 346]
print(check(yourarray, 1))
White Phantom
  • 141
  • 3
  • 7
  • The numbers can remove themselves in your nested loops. Also, you should (in general) not modify the list you are iterating on. – Cihan Mar 13 '19 at 16:19
0

I would do it following way:

Conceptually: Lets say that ten of number is quantity of 10 which could be fitted into given number for example ten of 111 is 11, ten of 247 is 24 and ten of 250 is 25 and so on. For our data if number with given ten already exist discard it.

Code:

data = [14,48,81,111,112,113,114,148,179,213,247,279,311,313,314,344,345,346]
cleaned = [i for inx,i in enumerate(data) if not i//10 in [j//10 for j in data[:inx]]]
print(cleaned) #[14, 48, 81, 111, 148, 179, 213, 247, 279, 311, 344]

Note that 10 is only example value, that you can replace with another value - bigger value means more elements will be potentially removed. Keep in mind that specific trait of this solution is that specific values pairs (for 10 for example 110 and 111) will be treated as different and would stay in output list, so you need to examine if that is not a problem in your case of usage.

Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Surely this wouldn't work if the group of similar values cross a multiple of 10 boundary? like of the array was [34 53 68 69 70 71 95], which would return [34 53 68 70 95] – AzurePineapple Mar 13 '19 at 16:10
  • For `[34 53 68 69 70 71 95]` it will give `[34 53 68 70 95]` as you have written, but note that for example for `[49 50 79 80 109 110]` it will produce `[49 50 79 80 109 110]`. To put it simply: my solution - for certain cases - would produce different results that other's method based on offset. – Daweo Mar 13 '19 at 16:37