1

I have a data set of individuals with their socioeconomic scores, ranging from -6.3 to 3.5. Now I want to assign each individual to their quantiles based on their socioeconomic score.

I have a dataset named Healthdata with two columns: Healthdata$SSE, and Healthdata$ID.

Eventually, I would like to get a data frame matched by their SSE quantiles.

How can I do this in R?

Sampson
  • 265,109
  • 74
  • 539
  • 565
mani
  • 251
  • 2
  • 6
  • 9
  • Search results from create quintiles: http://stackoverflow.com/questions/16724186/calculate-sum-of-a-column-based-on-ranking-of-another-column http://stackoverflow.com/questions/15561976/quantiles-by-factor-levels-in-r http://stackoverflow.com/questions/11728419/using-cut-and-quartile-to-generate-breaks-in-r-function (Similar results with "create deciles".) – IRTFM Dec 23 '13 at 17:39
  • Thanks very much for the useful materials. Very new to R, and learning slowly. – mani Dec 29 '13 at 19:08

2 Answers2

3

Here's one approach:

# an example data set
set.seed(1)
Healthdata <- data.frame(SSE = rnorm(8), ID = gl(2, 4))

transform(Healthdata, quint = ave(SSE, ID, FUN = function(x) {
  quintiles <- quantile(x, seq(0, 1, .2))
  cuts <- cut(x, quintiles, include.lowest = TRUE)
  quintVal <- quintiles[match(cuts, levels(cuts)) + 1]
  return(quintVal)
}))

#          SSE ID      quint
# 1 -0.6264538  1 -0.4644344
# 2  0.1836433  1  0.7482983
# 3 -0.8356286  1 -0.7101237
# 4  1.5952808  1  1.5952808
# 5  0.3295078  2  0.3610920
# 6 -0.8204684  2 -0.1304827
# 7  0.4874291  2  0.5877873
# 8  0.7383247  2  0.7383247

A simple illustration of how it works:

values <- 1:10
# [1]  1  2  3  4  5  6  7  8  9 10

quintiles <- quantile(values, seq(0, 1, .2))
#  0%  20%  40%  60%  80% 100% 
# 1.0  2.8  4.6  6.4  8.2 10.0 

cuts <- cut(values, quintiles, include.lowest = TRUE)
#  [1] [1,2.8]   [1,2.8]   (2.8,4.6] (2.8,4.6]
#  [5] (4.6,6.4] (4.6,6.4] (6.4,8.2] (6.4,8.2]
#  [9] (8.2,10]  (8.2,10] 
# 5 Levels: [1,2.8] (2.8,4.6] ... (8.2,10]

quintVal <- quintiles[match(cuts, levels(cuts)) + 1]
# 20%  20%  40%  40%  60%  60%  80%  80% 100% 100% 
# 2.8  2.8  4.6  4.6  6.4  6.4  8.2  8.2 10.0 10.0 
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
2

So let's start with a sample data set based on your description:

set.seed(315)
Healthdata <- data.frame(SSE = sample(-6.3:3.5, 21, replace=TRUE), ID = gl(7, 3))

Which gives something like this:

> Healthdata[1:15,]
    SSE ID
1  -0.3  1
2  -6.3  2
3  -1.3  3
4  -3.3  4
5  -5.3  5
6  -4.3  6
7  -4.3  7
8   0.7  8
9  -4.3  9
10 -4.3  10
11 -3.3  11
12  0.7  12
13 -2.3  13
14 -3.3  14
15  0.7  15

I understand that you want a new variable which identifies the quantile group of the individual's socioeconomic status. I would do something like this:

transform(Healthdata, Q = cut(Healthdata$SSE, 
                              breaks = quantile(Healthdata$SSE), 
                              labels = c(1, 2, 3, 4),
                              include.lowest=TRUE))

To return:

    SSE ID Q
1  -1.3  1 2
2  -6.3  2 1
3  -4.3  3 1
4   0.7  4 3
5   1.7  5 3
6   1.7  6 3
7  -5.3  7 1
8   1.7  8 3
9   2.7  9 4
10 -3.3 10 2
11 -1.3 11 2
12 -3.3 12 2
13  1.7 13 3
14  0.7 14 3
15 -4.3 15 1

If you want to see the upper and lower bounds for the quantile ranges, omit the labels = c(1, 2, 3, 4) to return this instead:

    SSE ID           Q
1  -1.3  1 (-4.3,-1.3]
2  -6.3  2 [-6.3,-4.3]
3  -4.3  3 [-6.3,-4.3]
4   0.7  4  (-1.3,1.7]
5   1.7  5  (-1.3,1.7]
Christian Lemp
  • 385
  • 1
  • 3
  • 10
  • 1
    Thanks. This is what I was looking for. Also, I tried with the package gtools and found the following useful. Q5=quantcut(SSE, q=seq(0,1,by=0.20)) – mani Dec 29 '13 at 18:59