2

I have a problem where I need to determine where a value lands between other values. This is an awful long question...but its a convoluted problem (at least to me).

The simplest presentation of the problem can be seen with the following data:

I have a value of 24.0. I need to determine where that value lands within six 'ranges'. The ranges are: 10, 20, 30, 40, 50, 60. I need to calculate where along the ranges, the value lands. I can see that it lands between 20 and 30. A simple if statement can find that for me.

My if statement for checking if the value is between 20 and 30 would be:

if value >=20 and value <=30:

Pretty simple stuff.

What I'm having trouble with is when I try to rank the output.

As an example, let's say that each range value is given an integer representation. 10 =1, 20=2, 30=3, 40=4, 50=5, 60=6, 70=7. Additionally, lets say that if the value is less than the midpoint between two values, it is assigned the rank output of the lower value. For example, my value of 24 is between 20 and 30 so it should be ranked as a "2".

This in and of itself is fairly straightforward with this example, but using real world data, I have ranges and values like the following:

  • Value = -13 with Ranges = 5,35,30,25,-25,-30,-35
  • Value = 50 with Ranges = 5,70,65,60,40,35,30
  • Value = 6 with Ranges = 1,40,35,30,5,3,0

Another wrinkle - the orders of the ranges matter. In the above, the first range number equates to a ranking of 1, the second to a ranking of 2, etc as I mentioned a few paragraphs above.

The negative numbers in the range values were causing trouble until I decided to use a percentile ranking which gets rid of the negative values all together. To do this, I am using an answer from Map each list value to its corresponding percentile like this:

y=[stats.percentileofscore(x, a, 'rank') for a in x]

where x is the ranges AND the value I'm checking. Running the value=6 values above through this results in y being:

x = [1, 40, 35, 30, 5, 3, 0, 6]

y=[stats.percentileofscore(x, a, 'rank') for a in x]

Looking at "y", we see it as:

[25.0, 100.0, 87.5, 75.0, 50.0, 37.5, 12.5, 62.5]

What I need to do now is compare that last value (62.5) with the other values to see what the final ranking will be (rankings of 1 through 7) according to the following ranking map:

1=25.0
2=100.0
3=87.5
4=75.0
5=50.0
6=37.5
7=12.5

If the value lies between two of the values, it should be assigned the lower rank. In this example, the 62.5 value would have a final ranking value of 4 because it sits between 75.0 (rank=4) and 50.0 (rank=5).

If I take 'y' and break it out and use those values in multiple if/else statements it works for some but not all (the -13 example does not work correctly).

My question is this:

How can I programmatically analyze any value/range set to find the final ranking without building an enormous if/elif structure? Here are a few sample sets. Rankings are in order of presentation below (first value in Ranges =1 , second = 2, etc etc)

  • Value = -13 with Ranges = 5, 35, 30, 25, -25, -30, -35 --> Rank = 4
  • Value = 50 with Ranges = 5, 70, 65, 60, 40, 35, 30 --> Rank = 4
  • Value = 6 with Ranges = 1, 40, 35, 30, 5, 3,0 --> Rank = 4
  • Value = 24 with Ranges = 10, 20, 30, 40, 50, 60, 70 --> Rank = 2
  • Value = 2.26 with Ranges = 0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95 --> Rank = 7
  • Value = 31 with Ranges = 10, 20, 30, 40, 60, 70, 80 --> Rank = 3

I may be missing something very easy within python to do this...but I've bumped my head on this wall for a few days with no progress.

Any help/pointers are appreciated.

Community
  • 1
  • 1
Eric D. Brown D.Sc.
  • 1,896
  • 7
  • 25
  • 37
  • 1
    Have you looked into numerical/scientific libraries like `numpy`, `scipy`, or `pandas`? This sounds like the sort of thing they would do. – TigerhawkT3 Jun 26 '15 at 18:41
  • Yep. I use all three often but I've not found anything that does what I need. scipy.stats is being used for the percentileofscore piece of this. – Eric D. Brown D.Sc. Jun 26 '15 at 18:42
  • 1
    Maybe take a look at this question as well: http://stackoverflow.com/questions/6053974/python-efficiently-check-if-integer-is-within-many-ranges – Leif Hedstrom Jun 26 '15 at 18:44
  • If each range is a `list`, maybe you could do something like `sorted(rangelist + [testvalue]).index(testvalue)`? – TigerhawkT3 Jun 26 '15 at 18:45
  • @Tigerhawk That's what I thought of at first but what he wants is a just a little bit different than that. – Blair Jun 26 '15 at 18:46
  • 1
    You could put each of the range values as a key in a dict. Their corresponding value would be the rank to be assigned. That way, it wouldn't be based on order, but rather whatever value you assigned. You can use an ordered dict to maintain order as well. – Rob Foley Jun 26 '15 at 18:47
  • @TigerhawkT3 - I tried something like your recommendation but it doesn't work in all cases. The value=-13 is one example where it doesn't work...it gives a rank of 3 when it should be a rank of 4 – Eric D. Brown D.Sc. Jun 26 '15 at 18:48
  • @LeifHedstrom -thanks for that link. I've read through that page a few times. I tried a few options there but nothing worked like what I needed but I will revisti. – Eric D. Brown D.Sc. Jun 26 '15 at 18:49
  • Would that snippet not work because of the negative values? You can use a function as the sort key to create any logical order you want (looks like numerical ascending first and then positive second). – TigerhawkT3 Jun 26 '15 at 18:53
  • @TigerhawkT3 it's because the original ordering of the list is important in the answer. – Blair Jun 26 '15 at 19:04
  • 1
    Your inputs and desired outputs don't seem to have the relation you state. Can you clarify? – TigerhawkT3 Jun 26 '15 at 19:08
  • That's the output from a program that's been running for 15 years. That said, the old program may have a bug in it for those edge cases where it is giving values that don't make a lot of sense. For example:[0.025 0.25 0.23 0.2 -0.2 -0.23 -0.25] and 0.15 gives a ranking of 4 when, by logic, it should be a 1. – Eric D. Brown D.Sc. Jun 26 '15 at 19:14
  • Thanks for the help everyone...@vk1011's solution gives me the matching output. – Eric D. Brown D.Sc. Jun 26 '15 at 19:29

3 Answers3

2
def checker(term):
    return term if term >= 0 else abs(term)+1e10

l1, v1 = [5, 35, 30, 25, -25, -30, -35], -13 # Desired: 4
l2, v2 = [5, 70, 65, 60, 40, 35, 30], 50 # Desired: 4
l3, v3 = [1, 40, 35, 30, 5, 3, 0], 6 # Desired: 4
l4, v4 = [10, 20, 30, 40, 50, 60, 70], 24 # Desired: 2
l5, v5 = [0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95], 2.26 # Desired: 7
l6, v6 = [10, 20, 30, 40, 60, 70, 80], 31 # Desired: 3

Result:

>>> print(*(sorted(l_+[val], key=checker).index(val) for
... l_, val in zip((l1,l2,l3,l4,l5,l6),(v1,v2,v3,v4,v5,v6))), sep='\n')
4
4
4
2
7
3
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
  • I think this is just exploiting a pattern in the specified tests rather than solving the problem as stated. For example, what do you get for [5, 70, 65, 60, 40, 35, 30] and 31? – Peter Brittain Jun 26 '15 at 22:30
  • Yes; this won't apply to arbitrarily-ordered `list`s, but I originally thought there was some kind of logical sort order. I don't think I could've guessed the surprise canonical COBOL implementation's algorithm regardless. – TigerhawkT3 Jun 26 '15 at 22:35
1

Taking the first example of -13.

y = [5, 35, 30, 25, -25, -30, -35]
value_to_check = -13

max_rank = len(y) # Default value in case no range found (as per 2.26 value example)

for ii in xrange(len(y)-1,0,-1):
    if (y[ii] <= value_to_check <= y[ii-1]) or (y[ii] >= value_to_check >= y[ii-1]):
        max_rank = ii
        break

>>> max_rank
4

In function form:

def get_rank(y, value_to_check):

    max_rank = len(y) # Default value in case no range found (as per 2.26 value example)

    for ii in xrange(len(y)-1,0,-1):
        if (y[ii] <= value_to_check <= y[ii-1]) or (y[ii] >= value_to_check >= y[ii-1]):
            max_rank = ii
            break

    return max_rank

When you call:

>>> get_rank(y, value_to_check)
4
vk1011
  • 7,011
  • 6
  • 26
  • 42
  • This is giving the wrong answers for many different options. for example, the Value = 6 with Ranges = 1, 40, 35, 30, 5, 3,0 -should give rank = 4 but this gives a rank =1 – Eric D. Brown D.Sc. Jun 26 '15 at 19:04
  • 1
    But doesn't 6 fall between 1 and 40, so the rank should indeed be 1? – vk1011 Jun 26 '15 at 19:05
  • it does...but it also falls between 30 and 5 (and the ranking = 4) so I know it must output a 4 here. This is my dilemna! :) – Eric D. Brown D.Sc. Jun 26 '15 at 19:09
  • 2
    So, you basically want it to return the first match _starting from the end of the range_? – TigerhawkT3 Jun 26 '15 at 19:10
  • TigerHawkT3 - based on the few edge cases that I am seeing where this isn't giving the correct result, I think it might just be starting from the end of the range – Eric D. Brown D.Sc. Jun 26 '15 at 19:20
  • 1
    @EricD.Brown: This method is now getting all the required ranks correctly that you state in your question. – vk1011 Jun 26 '15 at 19:23
  • I was just about to post a comment about a change to your code that would bring the correct rankings (based on original rankings). This does appear to be ranking things correctly. – Eric D. Brown D.Sc. Jun 26 '15 at 19:27
1

This correctly finds the answer for all your data:

def get_rank(l,n):
    mindiff = float('inf')
    minindex = -1
    for i in range(len(l) - 1):
        if l[i] <= n <= l[i + 1] or l[i + 1] <= n <= l[i]:
            diff = abs(l[i + 1] - l[i])
            if diff < mindiff:
                mindiff = diff
                minindex = i
    if minindex != -1:
        return minindex + 1         
    if n > max(l):
        return len(l)
    return 1

>>> test()
[5, 35, 30, 25, -25, -30, -35] -13 Desired: 4 Actual: 4
[5, 70, 65, 60, 40, 35, 30] 50 Desired: 4 Actual: 4
[1, 40, 35, 30, 5, 3, 0] 6 Desired: 4 Actual: 4
[10, 20, 30, 40, 50, 60, 70] 24 Desired: 2 Actual: 2
[0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95] 2.26 Desired: 7 Actual: 7
[10, 20, 30, 40, 60, 70, 80] 31 Desired: 3 Actual: 3

For completeness, here is my test() function, but you only need get_rank for what you are doing:

>>> def test():
        lists = [[[5, 35, 30, 25, -25, -30, -35],-13,4],[[5, 70, 65, 60, 40, 35, 30],50,4],[[1, 40, 35, 30, 5, 3,0],6,4],[[10, 20, 30, 40, 50, 60, 70],24,2],[[0.1, 0.55, 0.65, 0.75, 1.75, 1.85, 1.95],2.26,7],[[10, 20, 30, 40, 60, 70, 80],31,3]]
        for l,n,desired in lists:
            print l,n,'Desired:',desired,'Actual:',get_rank(l,n)
Blair
  • 6,623
  • 1
  • 36
  • 42
  • This looks like it is working. I've got a number of values/ranges to check but so far it looks good. What does it give you for [0.025 0.25 0.23 0.2 -0.2 -0.23 -0.25] and 0.15? It should be 4 but the routine gives me a '1'. Other values I'm testing all look correct. I'm seeing a few other false "1" rankings but 95% of the output is correct out of ~200 samples so far – Eric D. Brown D.Sc. Jun 26 '15 at 18:57
  • Is that range in any particular order? – TigerhawkT3 Jun 26 '15 at 19:05
  • 2
    @Eric I'm not sure why that should be giving a 4. From my understanding of the problem, that should be returning a 1. Could you explain why that should be giving a 4? – Blair Jun 26 '15 at 19:06
  • 1
    Good point: 0.15 is greater than 0.025, so it should indeed report "1". – TigerhawkT3 Jun 26 '15 at 19:07
  • Brien and Tigerhawk - you are both correct...it SHOULD report a 1 but the previous implementation of this (in COBOL for heaven's sake) is reporting a 4. Now...perhaps this COBOL implementation has been incorrect for a long time which is a possibility. I'm going to go with this function for now and talk to the original dev to see what might be the iossue. – Eric D. Brown D.Sc. Jun 26 '15 at 19:11
  • 1
    I'm surprised the original dev of a COBOL program is still with the same organization, not to mention still expected to remember the program's algorithm... – TigerhawkT3 Jun 26 '15 at 19:15
  • I'm surprised by that as well :) – Eric D. Brown D.Sc. Jun 26 '15 at 19:16
  • Perhaps it's using distance from the end points rather than smallest matching interval? – Peter Brittain Jun 26 '15 at 19:29