20

Apologies if this has been asked before, but I looked extensively without results.

import pandas as pd    
import numpy as np    
df = pd.DataFrame(data = np.random.randint(1,10,10),columns=['a'])    

   a
0  7
1  8
2  8
3  3
4  1
5  1
6  2
7  8
8  6
9  6

I'd like to create a new column b that maps several values of a according to some rule, say a=[1,2,3] is 1, a = [4,5,6,7] is 2, a = [8,9,10] is 3. one-to-one mapping is clear to me, but what if I want to map by a list of values or a range?

I tought along these lines...

df['b'] = df['a'].map({[1,2,3]:1,range(4,7):2,[8,9,10]:3})
jpp
  • 159,742
  • 34
  • 281
  • 339
E. Sommer
  • 710
  • 1
  • 7
  • 28
  • It shouldn't be hard to convert that mapping to a one-to-one mapping. How do you store that mapping data currently? – ayhan Apr 30 '18 at 09:59
  • So far, I inserted the dictionary 'by hand' as above because the mapping is relatively straightforward. But I could as well define the dictionary beforehand. I realize one could easily do this one-to-one, but what if I want to map the values [50..150] to some value? – E. Sommer Apr 30 '18 at 10:00
  • 1
    That's not a valid dictionary though. If you had something like, say, a tuple of key value pairs [([1, 2, 3], 1), (range(4, 7), 2), ([8, 9, 10], 3)], you could iterate over the list and generate a one-to-one mapping but you need to decide on your data structure first. – ayhan Apr 30 '18 at 10:07
  • 1
    If this is specific to ranges, not arbitrary collection of numbers, you may want to look at [`pd.cut`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.cut.html). – ayhan Apr 30 '18 at 10:10

2 Answers2

35

There are a few alternatives.

Pandas via pd.cut / NumPy via np.digitize

You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum's solution, and also in this answer.

NumPy via np.select

df = pd.DataFrame(data=np.random.randint(1,10,10), columns=['a'])

criteria = [df['a'].between(1, 3), df['a'].between(4, 7), df['a'].between(8, 10)]
values = [1, 2, 3]

df['b'] = np.select(criteria, values, 0)

The elements of criteria are Boolean series, so for lists of values, you can use df['a'].isin([1, 3]), etc.

Dictionary mapping via range

d = {range(1, 4): 1, range(4, 8): 2, range(8, 11): 3}

df['c'] = df['a'].apply(lambda x: next((v for k, v in d.items() if x in k), 0))

print(df)

   a  b  c
0  1  1  1
1  7  2  2
2  5  2  2
3  1  1  1
4  3  1  1
5  5  2  2
6  4  2  2
7  4  2  2
8  9  3  3
9  3  1  1
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    how about float values I tried to use dictionary mapping but it doesnt work for float values in the dataframe, it only classifies the integer ones – AHR Jun 24 '20 at 14:29
  • 1
    @AHR, Use `np.select` in that case, the dictionary method won't work. – jpp Jun 24 '20 at 15:58
12

IIUC you could use cut to achieve this:

In[33]:
pd.cut(df['a'], bins=[0,3,7,11], right=True, labels=False)+1

Out[33]: 
0    2
1    3
2    3
3    1
4    1
5    1
6    1
7    3
8    2
9    2

Here you'd pass the cutoff values to cut, and this will categorise your values, by passing labels=False it will give them an ordinal value (zero-based) so you just +1 to them

Here you can see how the cuts were calculated:

In[34]:
pd.cut(df['a'], bins=[0,3,7,11], right=True)

Out[34]: 
0     (3, 7]
1    (7, 11]
2    (7, 11]
3     (0, 3]
4     (0, 3]
5     (0, 3]
6     (0, 3]
7    (7, 11]
8     (3, 7]
9     (3, 7]
Name: a, dtype: category
Categories (3, interval[int64]): [(0, 3] < (3, 7] < (7, 11]]
EdChum
  • 376,765
  • 198
  • 813
  • 562