I am trying to find a clean, efficient to create a new variable with complex calculations on 5 existing variables. My problem is, the fact that one variable is a factor, and the other 4 contain NAs.
I have a dataset with several groups of variables with the following structure:
- expenditure_period - factor for which 1 = daily, 2 = weekly, 3 = monthly, 4 = yearly
- expenditure1 - integer, amount spent for a daily period
- expenditure2 - integer, amount spent for a weekly period
- expenditure3 - integer, amount spent for a monthly period
- expenditure4 - integer, amount spent for a yearly period
For each row/observation, only one of the 4 integer fields has a numerical value depending on the value of expenditure_period, the rest are NAs.
For example:
expenditure_period expenditure1 expenditure2 expenditure3 expenditure4
1 monthly NA NA 5 NA
2 weekly NA 5 NA NA
3 monthly NA NA 2 NA
4 monthly NA NA 5 NA
5 monthly NA NA 58 NA
I want to create a new variable that contains the standard monthly expenditure. So if the expenditure_period was daily, then expenditure1*30. If weekly, then expenditure2 * 4. If monthly, then expenditure3*1. And if yearly, then expenditure4 / 12.
The best solution I could come up with is the following mess:
data$expenditure_factor[data$expenditure_period=="daily"] <- 30
data$expenditure_factor[data$expenditure_period=="weekly"] <- 4
data$expenditure_factor[data$expenditure_period=="monthly"] <- 1
data$expenditure_factor[data$expenditure_period=="yearly"] <- 1/12
data$expenditure_month <- apply(data[,c("expenditure1", "expenditure2",
"expenditure3", "expenditure4", "expenditure_factor")], 1,
function(x) { sum(x[1:4], na.rm=TRUE) * x[5]} )
I tried adding together expenditure1,2,3,4 using the + operator, but this resulted in all NAs due to adding 1 number to 3 NAs. I tried creating an interim variable using the sum function with rm.na, but this resulted in the same sum for every row. I tried using mutate from the dplyr package, to no effect.
Is there a simpler, more elegant way to do this? I have to do the same process on about 12 different expenditure categories. I apologize if this has been asked before, I could not find a similar thread. Please direct me if there is one already.
I'm using RStudio with R 3.2.3 on Windows 7.