24

I want to replace outliners from a list. Therefore I define a upper and lower bound. Now every value above upper_bound and under lower_bound is replaced with the bound value. My approach was to do this in two steps using a numpy array.

Now I wonder if it's possible to do this in one step, as I guess it could improve performance and readability.

Is there a shorter way to do this?

import numpy as np

lowerBound, upperBound = 3, 7

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[arr > upperBound] = upperBound
arr[arr < lowerBound] = lowerBound

# [3 3 3 3 4 5 6 7 7 7]
print(arr)

See How can I clamp (clip, restrict) a number to some range? for clamping individual values, including non-Numpy approaches.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
ppasler
  • 3,579
  • 5
  • 31
  • 51
  • 1
    While it is nice that there's a compiled `clip` method, there's nothing un-pythonic about your code. It is a perfectly good use of `numpy`, and just as readable (to an experienced user). Keep that concept in your toolbox; it works in cases that don't quite fit the `clip` model. – hpaulj Dec 26 '16 at 18:00
  • 1
    This operation is generally called ***clamping***, ***clipping*** or else ***thresholding*** – smci Dec 26 '16 at 20:49
  • 2
    You should use the `clip` method but there is another reason than speed; your code is elegant but creates an intermediate array with `arr > upperBound` which could be an issue if the array gets large. – Thomas Baruchel Dec 26 '16 at 20:58
  • @hpaulj thanks for your comment. By the term "pythonic" I meant short and fast. I am aware my solution is not unpythonic, but the `clip()` method is enough for my special use case. The steps 1) doing it on your own 2) understanding the concept and 3) using a library are a good way to go :) – ppasler Dec 27 '16 at 17:41

2 Answers2

36

You can use numpy.clip:

In [1]: import numpy as np

In [2]: arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]: lowerBound, upperBound = 3, 7

In [4]: np.clip(arr, lowerBound, upperBound, out=arr)
Out[4]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])

In [5]: arr
Out[5]: array([3, 3, 3, 3, 4, 5, 6, 7, 7, 7])
arthur
  • 2,319
  • 1
  • 17
  • 24
14

For an alternative that doesn't rely on numpy, you could always do

arr = [max(lower_bound, min(x, upper_bound)) for x in arr]

If you just wanted to set an upper bound, you could of course write arr = [min(x, upper_bound) for x in arr]. Or similarly if you just wanted a lower bound, you'd use max instead.

Here, I've just applied both operations, written together.

Edit: Here's a slightly more in-depth explanation:

Given an element x of the array (and assuming that your upper_bound is at least as big as your lower_bound!), you'll have one of three cases:

  1. x < lower_bound
  2. x > upper_bound
  3. lower_bound <= x <= upper_bound.

In case 1, the max/min expression first evaluates to max(lower_bound, x), which then resolves to lower_bound.

In case 2, the expression first becomes max(lower_bound, upper_bound), which then becomes upper_bound.

In case 3, we get max(lower_bound, x) which resolves to just x.

In all three cases, the output is what we want.

wjandrea
  • 28,235
  • 9
  • 60
  • 81
mathmandan
  • 580
  • 10
  • 17
  • 2
    just my complaint (no vote), I tend to have to think *really* hard when I see max/min combinations and find them not that readable. – djechlin Dec 26 '16 at 15:58
  • 7
    @djechlin Sure, I don't disagree with that. On the other hand, the other answer to this point uses `numpy.clip`, which would not be immediately readable to me if I came across it somewhere--I'd probably want to double-check the numpy documentation, or else just guess what it did, and hope that the author got it right. – mathmandan Dec 26 '16 at 16:03
  • What's weird is the nesting. It's a very symmetric operation that consists of "clip once, "clip twice." This is "clip once, then clip that again." – djechlin Dec 26 '16 at 16:08
  • @djechlin Well...hmm. I guess to me, "clip once, clip twice" sounds very similar to "clip once, then clip that again", so I'm not sure if I completely understand your objection. But I do agree that using max/min together imposes some cognitive load...or else, requires some explanation. So I tried to give a (brief) explanation as well as the code. (However, I've said a lot more in the comments than I did in my answer, so that suggests that perhaps my answer was a little too brief!) – mathmandan Dec 26 '16 at 16:50
  • Nice one-liner! Always good to have an alternative solution - as I must say that it lacks of readability – ppasler Dec 27 '16 at 17:45