1

I have the following dataset.

dat2 <- read.table(header=TRUE, text="
ID  De  Ep  Ti  ID1
1123    113 121 100 11231
                   1123 105 107 110 11232
                   1134 122 111 107 11241
                   1134 117 120 111 11242
                   1154 122 116 109 11243
                   1165 108 111 118 11251
                   1175 106 115 113 11252
                   1185 113 104 108 11253
                   1226 109 119 116 11261
                   ")
dat2
  ID De  Ep  Ti   ID1
1  1  2 121 100 11231
2  1  1 107 110 11232
3  2  3 111 107 11241
4  2  2 120 111 11242
5  2  3 116 109 11243
6  3  1 111 118 11251
7  3  1 115 113 11252
8  4  2 104 108 11253
9  4  1 119 116 11261

I want to change first two columns to be changed like the following numeric labels. But it turns them into factor.

dat2$ID <- cut(dat2$ID, breaks=c(0,1124,1154,1184,Inf), 
               labels=c(5, 25, 55, 75))
table(dat2$ID)
 5 25 55 75 
 2  3  2  2 


dat2$De <- cut(dat2$De, breaks=c(0,110,118,125,Inf), 
               labels=c(10, 20, 30, 40))
table(dat2$De)
10 20 30 40 
 4  3  2  0 


str(dat2)
'data.frame':   9 obs. of  5 variables:
 $ ID : Factor w/ 4 levels "5","25","55",..: 1 1 2 2 2 3 3 4 4
 $ De : Factor w/ 4 levels "10","20","30",..: 2 1 3 2 3 1 1 2 1
 $ Ep : int  121 107 111 120 116 111 115 104 119
 $ Ti : int  100 110 107 111 109 118 113 108 116
 $ ID1: int  11231 11232 11241 11242 11243 11251 11252 11253 11261

I used as.numeric to convert them back to numeric that eventually creates new labeling (like 1, 2, 3) what I don't want. I need a simple line of code to transform it easily.

dat2$ID <- as.numeric(dat2$ID)
table(dat2$ID)
1 2 3 4 
2 3 2 2 

dat2$De <- as.numeric(dat2$De)
table(dat2$De)
1 2 3 
4 3 2
S Das
  • 3,291
  • 6
  • 26
  • 41
  • 2
    For converting factors to numbers see [here](http://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-an-integer-numeric-without-a-loss-of-information) – David Arenburg Oct 27 '15 at 14:16
  • 1
    use `as.numeric(as.character(dat2$ID))` and you'll get what you want. – Benjamin Oct 27 '15 at 14:16

1 Answers1

3

In your case it will probably be more efficient to use findInterval directly instead of converting numeric to factors and then back to numeric values as shown here

c(5, 25, 55, 75)[findInterval(dat2$ID, c(0, 1124, 1154, 1184, Inf))]
## [1]  5  5 25 25 55 55 55 75 75

Or (as per the second column)

c(10, 20, 30, 40)[findInterval(dat2$De, c(0, 110, 118, 125, Inf))]
## [1] 20 10 30 20 30 10 10 20 10

Which is equivalent to using cut but returns the numeric values directly

cut(dat2$ID, breaks=c(0, 1124, 1154, 1184, Inf), labels=c(5, 25, 55, 75))
# [1] 5  5  25 25 25 55 55 75 75
# Levels: 5 25 55 75

Here's a quick benchmark showing ~X18 speed improvement

set.seed(123)
x <- sample(1e8, 1e7, replace = TRUE) 

system.time({
  res1 <- cut(x, breaks = c(0, 1e4, 1e5, 1e6, Inf), labels = c(5, 25, 55, 75))
  res1 <- as.numeric(levels(res1))[res1]
})
# user  system elapsed 
# 3.40    0.09    3.51 

system.time(res2 <- c(5, 25, 55, 75)[findInterval(x, c(0, 1e4, 1e5, 1e6, Inf))])
# user  system elapsed 
# 0.18    0.03    0.20 

identical(res1, res2)
## [1] TRUE
Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196