0

Sorry guys if this is a noob question. I need help on how to loop over my dataframe.Here is a sample data.

a <- c(10:29);
b <- c(40:59);
e <- rep(1,20);
test <- data.frame(a,b,e)

I need to manipulate column "e" using the following criteria for values in column "a"

for all values of

"a" <= 15, "e" = 1,

"a" > 15 & < 20, "e" = 2

"a" > 20 & < 25, "e" = 3

"a" > 25 & < 30, "e" = 4 and so on to look like this

result <- cbind(a,b,rep(1:4, each=5))

My actual data frame is over 100k long. Would be great if you could sort me out here.

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Biju
  • 3
  • 2
  • I think the title should reflect what's being done here. You're trying to add a recoded column based on values from other columns. – Roman Luštrik Aug 14 '12 at 12:04
  • Sorry Roman about the not optimal title, you are right - its about recoding a column based on others. But you guys have sorted me out anyways :) – Biju Aug 14 '12 at 13:28
  • Just trying to help the next person interested in this problem. By giving it an informative title, there's a better chance that your answer will help someone. Feel free to edit the title to reflect your Q. :) – Roman Luštrik Aug 15 '12 at 18:44

3 Answers3

11
data.frame(a, b, e=(1:4)[cut(a, c(-Inf, 15, 20, 25, 30))])

Update:

Greg's comment provides a more direct solution without the need to go via subsetting an integer vector with a factor returned from cut.

data.frame(a, b, e=findInterval(a, c(-Inf, 15, 20, 25, 30)))
Community
  • 1
  • 1
Backlin
  • 14,612
  • 2
  • 49
  • 81
4

I would use cut() for this:

test$e = cut(test$a, 
             breaks = c(0, 15, 20, 25, 30), 
             labels = c(1, 2, 3, 4))

If you want to "generalize" the cut--in other words, where you don't know exactly how many sets of 5 (levels) you need to make--you can take a two-step approach using c() and seq():

test$e = cut(test$a, 
             breaks = c(0, seq(from = 15, to = max(test$a)+5, by = 5)))
levels(test$e) = 1:length(levels(test$e))

Since Backlin beat me to the cut() solution, here's another option (which I don't prefer in this case, but am posting just to demonstrate the many options available in R).

Use recode() from the car package.

require(car)    
test$e = recode(test$a, "0:15 = 1; 15:20 = 2; 20:25 = 3; 25:30 = 4")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
1

You don't need a loop. You have nearly all you need:

test[test$a > 15 & test$a < 20, "e"] <- 2
sgibb
  • 25,396
  • 3
  • 68
  • 74