28

Is there any shorter, more legible code style to solve this problem? I am trying to classify some float values into interregional folders.

def classify(value):   
    if value < -0.85 and value >= -0.95:
        ts_folder = r'\-0.9'
    elif value < -0.75 and value >= -0.85:
        ts_folder = r'\-0.8'
    elif value < -0.65 and value >= -0.75:
        ts_folder = r'\-0.7'    
    elif value < -0.55 and value >= -0.65:
        ts_folder = r'\-0.6'   
    elif value < -0.45 and value >= -0.55:
        ts_folder = r'\-0.5'  
    elif value < -0.35 and value >= -0.45:
        ts_folder = r'\-0.4'
    elif value < -0.25 and value >= -0.35:
        ts_folder = r'\-0.3'
    elif value < -0.15 and value >= -0.25:
        ts_folder = r'\-0.2'
    elif value < -0.05 and value >= -0.15:
        ts_folder = r'\-0.1'
    elif value < 0.05 and value >= -0.05:
        ts_folder = r'\0.0'
    elif value < 0.15 and value >= 0.05:
        ts_folder = r'\0.1'
    elif value < 0.25 and value >= 0.15:
        ts_folder = r'\0.2'
    elif value < 0.35 and value >= 0.25:
        ts_folder = r'\0.3'
    elif value < 0.45 and value >= 0.35:
        ts_folder = r'\0.4'
    elif value < 0.55 and value >= 0.45:
        ts_folder = r'\0.5'
    elif value < 0.65 and value >= 0.55:
        ts_folder = r'\0.6'
    elif value < 0.75 and value >= 0.65:
        ts_folder = r'\0.7'  
    elif value < 0.85 and value >= 0.75:
        ts_folder = r'\0.8'
    elif value < 0.95 and value >= 0.85:
        ts_folder = r'\0.9'

    return ts_folder
Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
Kuang 鄺世銘
  • 305
  • 3
  • 8
  • That's an example. In my experiment, diffence isn't always 0.5. round() is a good way in this case but not always work for me – Kuang 鄺世銘 Mar 15 '19 at 11:03
  • I think this is a case by case though. Think of whenever the pattern is not linear by example. – Olivier Melançon Mar 15 '19 at 11:17
  • 8
    At the very least, use chained comparisons: `-0.95 <= value < -0.85` instead of `value < -0.85 and value >= -0.95 – chepner Mar 15 '19 at 12:04
  • @Kuang鄺世銘 You might want to check out [Python's Philosophy](https://en.wikipedia.org/wiki/Python_(programming_language)#Features_and_philosophy) and [Sentdex's Python3 Playlist](https://www.youtube.com/watch?v=oVp1vrfL_w4&list=PLQVvvaa0QuDe8XSftW-RAxdo6OmaeL85M). – Malekai Mar 15 '19 at 13:48
  • 8
    It's a great way to hide bugs! `ts_folder` is undefined for values larger than `0.95` or smaller than `-0.85`. Also, the result for `-0.45` and `-0.35` differ by `0.2`. – Eric Duminil Mar 15 '19 at 17:34
  • Do you want your output to be string, or keep it numeric and just generate a formatted string output with 1 decimal place? i.e. do you want to **round**, or **format**? Also, do answers need to care about `np.NaN, Inf` etc. values? (e.g. at least don't die with an exception). Last, what about inputs outside the range [-0.95, +0.95) ? – smci Mar 17 '19 at 08:03

12 Answers12

45

Specific solution

There is no all-encompassing solution, but in your case you can use the following expression.

ts_folder = r'\{:.1f}'.format(round(value, 1))

General solution

If you actually need some kind of generalization, notice that any non-linear pattern will cause trouble. Although, there is a way to shorten the code.

def classify(key, intervals):
    for lo, hi, value in intervals:
        if lo <= key < hi:
            return value
    else:
        ... # return a default value or None

# A list of tuples (lo, hi, key) which associates any value in the lo to hi interval to key
intervals = [
    (value / 10 - 0.05, value / 10 + 0.05, r'\{:.1f}'.format(value / 10))
    for value in range(-9, 10)
]

value = -0.73

ts_folder = classify(value, intervals) # r'\-0.7'

Notice that the above is still not totally safe from some float rounding error. You can add precision by manually typing down the intervals list instead of using a comprehension.

Continuous intervals

If the intervals in your data are continuous, that is there is no gap between them, as in your example, then we can use some optimizations. Namely, we can store only the higher bound of each interval in the list. Then by keeping those sorted, we can use bisect for efficient lookup.

import bisect

def value_from_hi(hi):
    return r'\{:.1f}'.format(hi - 0.05)

def classify(key, boundaries):
    i = bisect.bisect_right(boundaries, key)
    if i < len(boundaries):
        return value_from_hi(boundaries[i])
    else:
        ... # return some default value

# Sorted upper bounds
boundaries = [-0.85, -0.75, -0.65, -0.55, -0.45, -0.35, -0.25, -0.15, -0.05,
              0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65, 0.75, 0.85, 0.95]

ts_folder = classify(-0.32, boundaries) # r'\-0.3'

Important note: the choice of using the higher bounds and bisect_right is due to the fact the higher bounds are excluded in your example. If the lower bounds were excluded, then we would have to use those with bisect_left.

Also note that you may want to treat numbers out of the range [-0.95, 0.95[ in some special way and note just leave those to bisect.

Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
  • 2
    Note that the OP had `if lo <=key < hi`. – Martin Bonner supports Monica Mar 15 '19 at 15:26
  • 1
    If the intervals are supposed to be contiguous, you can require that they are supplied sorted (low to high), and then just have the intervals be (hi, value), and then the loop becomes `for hi, value in intervals: if key < hi: return value` – Martin Bonner supports Monica Mar 15 '19 at 15:27
  • 1
    Floats comparison are tricky. On my computer, your specific solution returns different values for `[-0.75, -0.65, -0.55, -0.45, -0.05, -0.04, -0.03, -0.02, -0.01, 0.15, 0.25, 0.35, 0.85]` compared to OP's code. – Eric Duminil Mar 15 '19 at 17:32
  • 3
    Assuming the intervals partition a range, binary search with the `bisect` module would be a good option. – user2357112 Mar 15 '19 at 19:54
25

The bisect module will do exactly the right lookup for finding the right bin from a list of breakpoints. In fact, the example in the documentation is exactly a case like this:

The bisect() function is generally useful for categorizing numeric data. This example uses bisect() to look up a letter grade for an exam total (say) based on a set of ordered numeric breakpoints: 85 and up is an ‘A’, 75..84 is a ‘B’, etc.

>>> grades = "FEDCBA"
>>> breakpoints = [30, 44, 66, 75, 85]
>>> from bisect import bisect
>>> def grade(total):
...           return grades[bisect(breakpoints, total)]
>>> grade(66)
'C'
>>> map(grade, [33, 99, 77, 44, 12, 88])
['E', 'A', 'B', 'D', 'F', 'A']

Instead of a string for the value lookups, you'd want a list of strings for the exact folder names you need for each range of values. For example:

breakpoints = [-0.85, -0.75, -0.65]
folders = [r'\-0.9', r'\-0.8', r'\-0.7']
foldername = folders[bisect(breakpoints, -0.72)]

If you can automate even part of this table generation (using round(), or something similar), of course you should.

Peter
  • 14,559
  • 35
  • 55
16

One of the first rules with a block of code like this, is to always make the comparisons be in the same direction. So instead of

    elif value < -0.75 and value >= -0.85:

write

    elif -0.85 <= value and value < -0.75:

At this point you can observe that python allows chaining of comparisons, so you can write:

    elif -0.85 <= value < -0.75:

Which is an improvement itself. Alternatively, you can observe this is an ordered list of comparisons, so if you add in an initial comparisons, you can just write

    if value < -0.95:        ts_folder = ''
    elif value < -0.85:      ts_folder = r'\-0.9'
    elif value < -0.75:      ts_folder = r'\-0.8'
    elif value < -0.65:      ts_folder = r'\-0.7'    
    elif value < -0.55:      ts_folder = r'\-0.6'   
    elif value < -0.45:      ts_folder = r'\-0.5'  
    elif value < -0.35:      ts_folder = r'\-0.4'
    elif value < -0.25:      ts_folder = r'\-0.3'
    elif value < -0.15:      ts_folder = r'\-0.2'
    elif value < -0.05:      ts_folder = r'\-0.1'
    elif value < 0.05:       ts_folder = r'\0.0'
    elif value < 0.15:       ts_folder = r'\0.1'
    elif value < 0.25:       ts_folder = r'\0.2'
    elif value < 0.35:       ts_folder = r'\0.3'
    elif value < 0.45:       ts_folder = r'\0.4'
    elif value < 0.55:       ts_folder = r'\0.5'
    elif value < 0.65:       ts_folder = r'\0.6'
    elif value < 0.75:       ts_folder = r'\0.7'  
    elif value < 0.85:       ts_folder = r'\0.8'
    elif value < 0.95:       ts_folder = r'\0.9'
    else:                    ts_folder = ''

That's still quite long, but a) it's a lot more readable; b) it has explicit code to handle value < -0.95 or 0.95 <= value

RonJohn
  • 349
  • 8
  • 20
11

You can use the round() built-in :

ts_folder = "\\" + str(round(value + 1e-16, 1)) # To round values like .05 to .1, not .0
if ts_folder == r"\-0.0": ts_folder = r"\0.0" 

More on round()

Fukiyel
  • 1,166
  • 7
  • 19
  • 1
    It doesn't seem to work for `[-0.85, -0.75, -0.65, -0.55, -0.45, -0.35, -0.25, -0.15, -0.05, -0.04, -0.03, -0.02, -0.01, 0.0]`, compared to OP's code. I'm not sure if it's a bug or a feature. – Eric Duminil Mar 15 '19 at 17:41
  • Still have an issue with `[-0.05, -0.04, -0.03, -0.02, -0.01]`, as they get rounded to -0.0 instead of 0.0 as in the OP. – Wlerin Mar 17 '19 at 06:25
  • I suspect you meant to include a `*10` inside the round() call as well, but unfortunately this just changes the problem from negative zero to banker's rounding. Perhaps the simplest solution is to round the value using your original method, then check if it's equal to `0` (because `-0.0 == 0.0`) and if so set it to 0. – Wlerin Mar 17 '19 at 07:46
11

All answers revolve around rounding, which seems to be fine in this case, but just for the sake of argument I'd like to also point out a cool python use of dictionaries which is often described as an alternative to other languages switch(es) and that in turn allow for arbitrary values.

ranges = {
    (-0.85, -0.95): r'\-0.9',
    (-0.75, -0.85): r'\-0.8',
    (-0.65, -0.75): r'\-0.7',
    (-0.55, -0.65): r'\-0.6'
    ...
}

def classify (value):
    for (ceiling, floor), rounded_value in ranges.items():
        if floor <= value < ceiling:
            return rounded_value

Output:

>>> classify(-0.78)
\-0.8
chepner
  • 497,756
  • 71
  • 530
  • 681
Hirabayashi Taro
  • 933
  • 9
  • 17
  • 21
    In this case you're NOT using the "dict dispatch" trick - you're doing a sequential scan, so you'd get the exact same result with a list of `(start, stop, val)` tuples (but with the added overhead of creating a dict and doing a useless `__getitem__` access). – bruno desthuilliers Mar 15 '19 at 12:39
  • 2
    @chepner By editing this code you have made it not work; it indexes into `ranges` with `current_value` which is not defined (because you deleted it). – Arthur Tacca Mar 15 '19 at 14:36
  • @brunodesthuilliers: I would argue the dict is not useless; the code is much more readable, and easy to modify. Sure, it's not *efficient* (O(n)), but n is small and it may be the right choice in some cases. – danuker Mar 17 '19 at 10:19
  • @brunodesthuilliers I never said anything about "dict as dispatch table"; I think you confuse me with the author of this answer. I just stumbled across this question and answer and found the answer could not work, and looking into it more found the original answer did work but an editor had broken it while "improving" it. It has since been fixed. Still, if someone considers an answer substantially lacking, I think it makes more sense to post a new one than to totally rewrite the answer in an edit. – Arthur Tacca Mar 18 '19 at 12:22
  • @ArthurTacca oops, sorry, there was some confusion indeed (actually with not only the author but also with danuker). I cannot edit my comment anymore so I'll delete and repost an edited version. – bruno desthuilliers Mar 18 '19 at 13:09
  • @danuker using a dict when you actually use it as list of triplets certainly doesn't make the code "more readable" IMHO - it just makes the intention unclear and mostly looks like a begginer's WTF (as far as I'm concerned I would immediatly refactor this to a list of triplets or a list of range pair=> value pairs if I was to work on this code). wrt/ the algorithm being O(n), there's not much choice here anyway, and this is a clear indication that a dict is not the right type. – bruno desthuilliers Mar 18 '19 at 13:20
  • @brunodesthuilliers Why refactor as a list? a dict has the meaning of a "mapping", i.e., from X to Y; and in this case X is the range, and Y is the value to map it to. A list would clarify that this is not O(1), but would hide that it is a mapping. – danuker Mar 20 '19 at 23:38
  • @danuker There's a logical flaw in your reasonning - the fact that a dict as a "mapping" semantic doesn't mean that all mappings are dicts (this is the canonical "Socrate is mortal, humans are mortal so Socrate is human" logical error). If you go this way, the arithmetic "sum" operation is also a mapping (it maps a set of number to the sum of those numbers), so would you say that `sum` should be a dict instead of a function ? Actually, a dict is mapping keys to values. Here, the upper and lower bounds are not keys (they are not used as such), they are operands for a comparison function. – bruno desthuilliers Mar 21 '19 at 09:14
  • @brunodesthuilliers I did not claim all mappings should be dicts. I claimed that, in this specific case, where you need to store items like `(start, stop) -> value` (which is a mapping), a dict would make your brain think "a-ha! from something to something else", as opposed to a list, which is undifferentiated `(start, stop, value)`. – danuker Mar 21 '19 at 15:15
5

Actually in Python 3 .85 will be round to .8. As per the question .85 should be round to .9.

Can you try the following:

round2 = lambda x, y=None: round(x+1e-15, y)
ts_folder = r'\{}'.format(str(round2(value, 1)))

Output:

>>> round2(.85, 1)
0.9
>>> round2(-.85, 1)
-0.8
Jeril
  • 7,858
  • 3
  • 52
  • 69
  • As Wlerin said under my post : `-.05`, `-.04`, `-.03` etc will alas be transformed to `\-0.0`, and not `\0.0` – Fukiyel Mar 17 '19 at 07:00
3
from decimal import Decimal

def classify(value):
    number = Decimal(value)
    result = "%.2f" % (number)
    return Decimal(round(float(result), 2))
wizzwizz4
  • 6,140
  • 2
  • 26
  • 62
Asif Akhtar
  • 323
  • 3
  • 14
3

How about turning it into a loop?

def classify(value):
    i = -5
    while i < 95:
        if value < (i + 10) / 100.0 and value >= i / 100.0:
            return '\\' + repr((i + 5) / 100.0)
        i += 10

it's not efficient by any means, but it's equivalent to what you have, just shorter.

user541686
  • 205,094
  • 128
  • 528
  • 886
2

You don't need the and value >= -.85 in elif value < -0.75 and value >= -0.85:; if the value isn't greater than or equal to -.85, then you won't reach the elif. You can also just turn all the elifs into if by having each one return immediately.

In this case, since you have the boundaries at regular intervals, you can just round (in the general case of regular intervals, you may have to divide and then round, for instance if the intervals are at every three units, then you would divide the number by three and round). In the general case, it's faster to store the boundaries in a tree structure, and then do a binary search for where the item goes.

Doing a binary search explicitly would be something like this:

def classify(value):   
    if value < -.05:
        if value < -.45:
            if value < -.65:
                if value < -.85:
                    if value < -.95:
                        return None
                    return r'\-0.9'
                if value < -.75:
                    return r'\-0.8'
                return r'\-0.7'
    ...

Although this code is harder to read than yours, it runs in time logarithmic rather than linear with respect to the number of boundaries.

If the number of items is significantly larger than the number of boundaries, it would probably be faster to actually create a tree of the items, and insert the boundaries.

You could also create a list, sort it, and then look at the index. For instance, compare (sorted([(_-9.5)/10 for _ in range(20)]+[x]).index(x)-9)/10 to your function.

Acccumulation
  • 3,491
  • 1
  • 8
  • 12
2

Many of these answers suggest some kind of rounding as a solution. Unfortunately, there are three problems with using rounding for this purpose, and at the time of writing all of them fell prey to at least one.

  • Floating point representation of decimal values is inexact. For example, the float 0.85 is in fact 0.8499999999999999777955395....
  • round() uses ties-round-to-even, also known as scientific or banker's rounding, rather than the arithmetic rounding many of us learned in school. This means e.g. 0.85 rounds to 0.8 instead of 0.9, and 0.25 rounds to 0.2 instead of 0.3.
  • very small negative floats (and Decimals) round up to -0.0 rather than 0.0 as the OP's mapping requires.

These can all be solved using the Decimal module, though not as prettily as I'd like:

from decimal import Decimal, ROUND_HALF_UP, ROUND_HALF_DOWN

def classify(value):
    number = Decimal('{:.2f}'.format(value))

    if number < 0:
        round_method = ROUND_HALF_DOWN
    else:
        round_method = ROUND_HALF_UP
    rounded_number = number.quantize(Decimal('0.1'), rounding=round_method)

    if rounded_number == 0.0:
        rounded_number = Decimal('0.0')
    return r'\{}'.format(rounded_number)

Both ROUND_HALF_DOWN and ROUND_HALF_UP are required as ROUND_HALF_UP actually rounds away from zero rather than towards Infinity. .quantize rounds a Decimal value to the places given by the first argument, and allows us to specify a rounding method.

Bonus: Bisect Breakpoints using range()

For the bisect solutions, this will generate the breakpoints used by the OP:

from decimal import Decimal
breakpoints = [Decimal('{}e-2'.format(e)) for e in range(-85, 96, 10)]
Wlerin
  • 262
  • 1
  • 9
1

Take a look at the round() function in python. Maybe you can work it out without the if.

With this function you can specify the number of digits you need to keep. For example :

x = round(5.76543, 2)
print(x)

That code will print 5.77

Olivier Melançon
  • 21,584
  • 4
  • 41
  • 73
MelKoutch
  • 180
  • 1
  • 12
1

Try something like this, if you don't like loops:

def classify(value): 
    endpts = [-0.95, -0.85,    -0.75,    -0.65,    -0.55,    -0.45,    -0.35,    -0.25,    -0.15,    -0.05,    0.05,    0.15,    0.25,    0.35,    0.45,    0.55,    0.65,    0.75,    0.85,    0.95] 
    ts_folder = [ r'\-0.9', r'\-0.8', r'\-0.7', r'\-0.6', r'\-0.5', r'\-0.4', r'\-0.3', r'\-0.2', r'\-0.1', r'\0.0', r'\0.1', r'\0.2', r'\0.3', r'\0.4', r'\0.5', r'\0.6', r'\0.7', r'\0.8', r'\0.9'] 
    idx = [value >= end for end in endpts].index(False) 
    if not idx:
        raise ValueError('Value outside of range')
    return ts_folder[idx-1] 

Of course, the loop is just "hidden" in the list comprehension. Obviously, in this example, it would be better to generate endpts and ts_fol programmatically rather than writing them all out, but you indicated that in the real situation the endpoints and values aren't so straightforward.

This raises a ValueError if value ≥ 0.95 (because False is not found in the list comprehension) or if value < -0.95 (because then idx is 0); the original version raises a UnboundLocalError in these cases.

You could also save three lines and skip a few comparisons by doing this:

def classify(value):
    endpts = [-0.95,    -0.85,    -0.75,    -0.65,    -0.55,    -0.45,    -0.35,    -0.25,    -0.15,    -0.05,    0.05,    0.15,    0.25,    0.35,    0.45,    0.55,    0.65,    0.75,    0.85,    0.95]
    ts_fol = [ None, r'\-0.9', r'\-0.8', r'\-0.7', r'\-0.6', r'\-0.5', r'\-0.4', r'\-0.3', r'\-0.2', r'\-0.1', r'\0.0', r'\0.1', r'\0.2', r'\0.3', r'\0.4', r'\0.5', r'\0.6', r'\0.7', r'\0.8', r'\0.9']
    return next((ts for ts, end in zip(ts_fol, endpts) if value < end), None)

This version returns None rather than raising exceptions for any value outside the bounds.

Nick Matteo
  • 4,453
  • 1
  • 24
  • 35