Use a dictionary to key a range of values

Question

I have a pandas dataframe and I want to create categories in a new column based on the values of another column. I can solve my basic problem by doing this:

range = {
    range(0, 5) : 'Below 5',
    range(6,10): 'between',
    range(11, 1000) : 'above'
}

df['range'] = df['value'].map(range)

In the final dictionary key I have chosen a large upper value for range to ensure it captures all the values I am trying to map. However, this seems an ugly hack and am wondering how to generalise this without specifying the upper limit. ie. if > 10 : 'above'.

Thanks

Does this answer your question? [Mapping ranges of values in pandas dataframe](https://stackoverflow.com/questions/50098025/mapping-ranges-of-values-in-pandas-dataframe) Also, see [How to map numeric data into categories / bins in Pandas dataframe](https://stackoverflow.com/q/49382207/7851470) — Georgy, Mar 04 '20 at 13:08

score 5 · Answer 1 · answered Mar 03 '20 at 15:44

You could set all values first to 'above', and then use map() for the remaining options (thus with your range dict having only two items):

range = {
    range(0, 5) : 'Below 5',
    range(6,10): 'between',

}
df['range'] = 'above'
df['range'] = df['value'].map(range)

Michael Heidelberg · Accepted Answer · 2020-03-03T15:49:05.583

Assume you have a dataframe like this:

  range value
0   0     0
1   1     1
2   2     2
3   3     3
4   4     4
5   5     5
6   6     6
7   7     7
8   8     8
9   9     9

Then you can apply the following function to the column 'value':

def get_value(range):
    if range < 5:
        return 'Below 5'
    elif range < 10:
        return 'Between 5 and 10'
    else:
        return 'Above 10'

df['value'] = df.apply(lambda col: get_value(col['range']), axis=1)

To get the result you want.

score 4 · Answer 3 · answered Mar 03 '20 at 15:46

4

Thanks for the hints. I see I can achieve the same with:

df['range'] = df['value'].map(range).fillna('above')

answered Mar 03 '20 at 15:46

ojp

973
1
11
26

Chris Adams · Answer 4 · 2020-03-03T15:53:09.480

Here's another approach using numpy.select, where you specify a list of boolean conditions, and a list of choices:

import numpy as np 

# Setup
df = pd.DataFrame({'value': [1, 3, 6, 8, 20, 10000000]})

condlist = [
    df['value'].lt(5),
    df['value'].between(5, 10),
    df['value'].gt(10)]

choicelist = ['Below 5', 'between', 'above']

df['out'] = np.select(condlist, choicelist)
print(df)

[out]

      value      out
0         1  Below 5
1         3  Below 5
2         6  between
3         8  between
4        20    above
5  10000000    above

Another idea would be to use pandas.cut with bins and labels parameters specified:

df['out'] = pd.cut(df['value'], bins=[-np.inf, 5, 10, np.inf],
                   labels=['below', 'between', 'above'])

      value      out
0         1    below
1         3    below
2         6  between
3         8  between
4        20    above
5  10000000    above

score 3 · Answer 5 · answered Mar 03 '20 at 15:49

3

pandas.Series.map accepts also function as first argument, so you could do:

def fun(x):
    if x in range(0, 5):
        return 'Below 5'
    elif x in range(6, 10):
        return 'between'
    elif x >= 11:
        return 'above'

then:

df['range'] = df['value'].map(fun)

answered Mar 03 '20 at 15:49

Daweo

31,313
3
12
25

Oh, nice. Now for replacing `run` with a lambda function with a nested ternary comparison, to one-line everything. I guess `df['range'] = df['value'].map(lambda x: 'Below 5' if in x range(0, 5) else ('between' if x in (range(6, 10) else 'above'))` could do that. – 9769953 Mar 03 '20 at 16:01
@0 0: for me one-line nested `if`s are not very readable. If I would have fit into 1 line in this case AT ANY PRICE then I would rather do: `f = lambda x:('Below 5','between','above')[(x>=5)+(x>=11)]` – Daweo Mar 04 '20 at 08:23
Also nice.I was definitely going for obfuscated Python one-liners. Luckily, in Python,`True + True == 2`. – 9769953 Mar 04 '20 at 09:45

score 0 · Answer 6 · answered Mar 03 '20 at 15:46

0

 df['range'] = pd.cut(df['value'], bins = [0, 5, 10, 1000], labels = ["below 5", "between", "above"])

answered Mar 03 '20 at 15:46

r.user.05apr

5,356
3
22
39

1

That still suffers from setting a rather arbritrary upper limit (one could use `max(df['value'])` of course. – 9769953 Mar 03 '20 at 15:48

Use a dictionary to key a range of values

6 Answers6

Linked