How to divide a table into different levels based on values in R?

Question

Suppose that i have following data

ID  Category    Price   Month
1   X            2       1
1   X            2       2
1   X            2       3
1   X            2       4
2   X            3       1
2   X            3       2
2   X            3       3
2   X            3       4
3   X            1       1
3   X            1       2
3   X            1       3
3   X            1       4
4   X            10      1
4   X            10      2
4   X            10      3
4   X            10      4
5   Y            5       1
5   Y            5       2
5   Y            5       3
5   Y            5       4
6   Y            2       1
6   Y            2       2
6   Y            2       3
6   Y            2       4
7   Y            1       1
7   Y            1       2
7   Y            1       3
7   Y            1       4
8   Y            10      1
8   Y            10      2
8   Y            10      3
8   Y            10      4

There are different prices for products in a particular category, some are low priced and some high priced. I want to have a new variable "Price Level" which shows whether the product is low priced product, medium priced product or high priced product.

Levels are defined as follows. It takes the prices of all the products in a particular category and divide into 4 percentiles.

1st Percentile = Low
2nd and 3rd percentile = Medium
4th percentile = High

So the table will look like this

ID  Category    Price   Month   Price Level
1   X            4       1        Medium
1   X            4       2        Medium
1   X            4       3        Medium
1   X            4       4        Medium
2   X            3       1        Medium
2   X            3       2        Medium
2   X            3       3        Medium
2   X            3       4        Medium
3   X            1       1        Low
3   X            1       2        Low
3   X            1       3        Low
3   X            1       4        Low
4   X            10      1        High
4   X            10      2        High
4   X            10      3        High
4   X            10      4        High
5   Y            5       1        Medium
5   Y            5       2        Medium
5   Y            5       3        Medium
5   Y            5       4        Medium
6   Y            2       1        Low
6   Y            2       2        Low
6   Y            2       3        Low
6   Y            2       4        Low
7   Y            1       1        Low
7   Y            1       2        Low
7   Y            1       3        Low
7   Y            1       4        Low
8   Y            10      1        Low
8   Y            10      2        Low
8   Y            10      3        Low
8   Y            10      4        Low

alistaire · Answer 1 · 2016-03-20T04:58:02.127

You can lapply across the data.frame split by Category, and call cut and quantile on each group. data.frame and do.call(rbind, reassemble the data back into a single data.frame:

do.call(rbind, lapply(split(df, df$Category), function(x){
    data.frame(x, Price_Level =  cut(x$Price, 
                                     quantile(x$Price, probs = c(0, .25, .75, 1)), 
                                     labels = c('Low', 'Medium', 'High'), 
                                     include.lowest = TRUE))
}))

#    ID Category Price Month Price_Level
# 1   1        X     2     1      Medium
# 2   1        X     2     2      Medium
# 3   1        X     2     3      Medium
# 4   1        X     2     4      Medium
# 5   2        X     3     1      Medium
# 6   2        X     3     2      Medium
# 7   2        X     3     3      Medium
# 8   2        X     3     4      Medium
# 9   3        X     1     1         Low
# 10  3        X     1     2         Low
# 11  3        X     1     3         Low
# 12  3        X     1     4         Low
# 13  4        X    10     1        High
# 14  4        X    10     2        High
# 15  4        X    10     3        High
# 16  4        X    10     4        High
# 17  5        Y     5     1      Medium
# 18  5        Y     5     2      Medium
# 19  5        Y     5     3      Medium
# 20  5        Y     5     4      Medium
# 21  6        Y     2     1      Medium
# 22  6        Y     2     2      Medium
# 23  6        Y     2     3      Medium
# 24  6        Y     2     4      Medium
# 25  7        Y     1     1         Low
# 26  7        Y     1     2         Low
# 27  7        Y     1     3         Low
# 28  7        Y     1     4         Low
# 29  8        Y    10     1        High
# 30  8        Y    10     2        High
# 31  8        Y    10     3        High
# 32  8        Y    10     4        High

If you just want to return a single column, but don't want to worry about grouping messing up your order, you can use the equivalent

factor(ave(df$Price, df$Category, FUN = function(x){
    cut(x, 
        quantile(x, probs = c(0, .25, .75, 1)), 
        include.lowest = TRUE)
}), levels = c(1, 2, 3), labels = c('Low', 'Medium', 'High'))

A slightly less ugly version with dplyr:

library(dplyr)
df %>% group_by(Category) %>% mutate(Price_Level = cut(Price, 
                                                       quantile(Price, c(0, .25, .75, 1)), 
                                                       labels = c('Low', 'Medium', 'High'), 
                                                       include.lowest = TRUE))

I tried to apply the code on original data which does not have equal number of rows in each category and month so i got the error " 'breaks' are not unique" so i tried to put quantile inside unique function and then i got the error "lengths of 'breaks' and 'labels' differ". How to fix it? — Jay khan, Mar 20 '16 at 05:18

score 1 · Answer 2 · answered Mar 20 '16 at 04:59

1

We can use data.table

library(data.table)
setDT(df)[, Price_Level := cut(Price, 
             quantile(Price, c(0, .25, .75, 1)), 
            labels = c('Low', 'Medium', 'High'),
            include.lowest = TRUE), by = Category]

answered Mar 20 '16 at 04:59

akrun

874,273
37
540
662

I tried to apply the code on original data which does not have equal number of rows in each category and month so i got the error " 'breaks' are not unique" so i tried to put quantile inside unique function and then i got the error "lengths of 'breaks' and 'labels' differ". How to fix it? – Jay khan Mar 20 '16 at 05:18
@Jaykhan In the example data, it is working. – akrun Mar 20 '16 at 05:23

How to divide a table into different levels based on values in R?

2 Answers2