0

I have a data frame and I would like to perform a conditional subtraction/addition of percentage values within different ids using unique codes.

Specifically, I would like to add 10% of code 1 percent values to code 3 percent values, and subtract 10% of code 1 percent values from code 1. The rest of the codes stay the same. Ideally results would be added into a new column.

My question is similar to these two, with some important differences. R ddply with multiple variables and Easiest way to subtract associated with one factor level from values associated with all other factor levels.

I think the best way to do this is plyr, and I have this already however it doesn't work.

df <- data.frame(id=c(rep("113316", 4), rep("113317", 3)), code=c(1,3,4,5,1,3,4), percent=c(0.2571, 0.7257, 0.0114, 0.0057, 0.9596, 0.0058, 0.0857))
df.2 <- ddply(df, .(id, code), transform, percent=(percent*.90[code==1]+percent[code==3] | percent=percent*.90[code==1]-percent[code==1]))

Output would look like this:

id     code percent new
113316 1    0.2571  0.23139
113316 3    0.7257  0.75141
113316 4    0.0114  0.01140
113316 5    0.0057  0.00570
113317 1    0.9596  0.86364
113317 3    0.0058  0.10176
113317 4    0.0857  0.08570
Community
  • 1
  • 1
user3367135
  • 131
  • 2
  • 12

1 Answers1

3

you may want to do this in two steps as in:

#initialize the new variable
df$new <- df$percent
# Add 10% from code == 1 to  code == 3
df$new[df$code == 3] <- df$new[df$code == 3] + 0.1 * df$percent[df$code == 1]
# sutbtract off 10% from code 1 where code == 1
df$new[df$code == 1] <- 0.9 *df$new[df$code == 1] 

Note that this assumes that sum(df$code == 1) == sum(df$code == 3), otherwise there will be some recycling which may cause hard-to-detect errors later in your calculations. This also assumes that the data are ordered by id.

A dplyr solution that makes fewer assumptions on the structure of your data.frame would group_by id and mutate using a window function like so:

fun  <- function(id,percent){
    if(all(c(1,3) %in% id)){
        percent[id == 3] <- percent[id == 3] + 0.1*percent[id == 1]
        percent[id == 1] <- 0.9*percent[id == 1] 
    }
    percent
}

library(dplyr)
df %>% 
    group_by(id) %>% 
    mutate(new = fun(id,percent))
Jthorpe
  • 9,756
  • 2
  • 49
  • 64
  • The second value in `new` is not the same as the OP's expected output. May be it is a typo from OP's post – akrun Feb 26 '15 at 19:55
  • Hi, sorry about that. I've edited my answer. This first part works great and your assumptions are perfectly correct - I want these to add to 1. – user3367135 Feb 27 '15 at 00:05
  • The second part doesn't appear to change anything but maybe I need to edit it. – user3367135 Feb 27 '15 at 00:14