-1

I have a complicated question that I will try to simplify by simplifying my dataset. Say I have 5 variables:

df$Id <- c(1:12)
df$Date <- c(NA,NA,a,a,b,NA,NA,b,c,c,b,a)
df$va <- c(1.1, 1.4, 2.5, ...)     #12 randoms values
df$vb <- c(5.9, 2.3, 4.7, ...)     #12 other random values
df$vc <- c(3.0, 3.3, 3.7, ...)     #12 more random values

Then I want to create a new variable that takes the value from va, vb, or vc if the date is equal to a, b, or c. I had tried a nested if-else, which did not work. I also tried:

df$new[df$date=='a' & !is.na(df$date)] <- df$va
df$new[df$date=='b' & !is.na(df$date)] <- df$vb
df$new[df$date=='c' & !is.na(df$date)] <- df$vc

This correctly left NA's in the new variable where Date=NA, however the values provided were not from va, vb, or vc, but some other value altogether. How can I get df$new to equal va if the date is 'a', vb if the date is 'b', and vc if the date is 'c'?

  • 3
    Please make it reproducible by (1) putting values for `a`,`b`,`c` (2) not using `...` and so on. We cannot copy paste your code to make the example, as-is. Consider using `dput(head(df))` and having a look here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Frank May 07 '15 at 23:35
  • Are the columns and dates actually named va, vb, vc, a, b, c? That's easier than names that don't actually match. – Molx May 08 '15 at 01:03
  • @Molx: the names of the columns are FSLE20060813, FSLE20060814, etc. whereas the name of the dates are 2006-08-13, 2006-08-14, etc. – Heidi Church May 08 '15 at 01:17
  • @Frank: The values I put in are not my actual values, as the dataset is WAY bigger in both dimensions than what I provided. I did not know the code to generate random numbers, so I did not want to add another line that required fixing. For reproducible purposes, you could either use a random number generator, or do something like df$va <- c(12:24), df$vb <-c(4:16), and df$vc<-c(7:19). – Heidi Church May 08 '15 at 01:17
  • possible duplicate of [How to create a new variable with values from different variables if another variable equals a set value in R?](http://stackoverflow.com/questions/30115455/how-to-create-a-new-variable-with-values-from-different-variables-if-another-var) –  May 08 '15 at 04:41

3 Answers3

5

Try

library(dplyr)
df %>% 
    mutate(new = (Date=="a")*va + (Date=="b")*vb + (Date=="c")*vc)
#   Id Date         va        vb         vc       new
#1   1 <NA> 0.26550866 0.6870228 0.26722067        NA
#2   2 <NA> 0.37212390 0.3841037 0.38611409        NA
#3   3    a 0.57285336 0.7698414 0.01339033 0.5728534
#4   4    a 0.90820779 0.4976992 0.38238796 0.9082078
#5   5    b 0.20168193 0.7176185 0.86969085 0.7176185
#6   6 <NA> 0.89838968 0.9919061 0.34034900        NA
#7   7 <NA> 0.94467527 0.3800352 0.48208012        NA
#8   8    b 0.66079779 0.7774452 0.59956583 0.7774452
#9   9    c 0.62911404 0.9347052 0.49354131 0.4935413
#10 10    c 0.06178627 0.2121425 0.18621760 0.1862176
#11 11    b 0.20597457 0.6516738 0.82737332 0.6516738
#12 12    a 0.17655675 0.1255551 0.66846674 0.1765568

Or,

library(data.table)
setDT(df)[,new:= (Date=="a")*va + (Date=="b")*vb + (Date=="c")*vc,]

Data

set.seed(1)
df <- data.frame(Id = 1:12,
                 Date = c(NA,NA,"a","a","b",NA,NA,"b","c","c","b","a"),
                 va = runif(12),
                 vb = runif(12),
                 vc = runif(12), stringsAsFactors = FALSE)
ExperimenteR
  • 4,453
  • 1
  • 15
  • 19
1

You may also do row/column indexing using base R (using the data from @ExperimenteR). Even if there are 100's of unique 'Date' with corresponding 'v' columns, we may not need to change the code especially the cbind(..) part.

  df$new <- df[-(1:2)][cbind(1:nrow(df),match(df$Date, sort(unique(df$Date))))]
  df
  #   Id Date         va        vb         vc       new
  #1   1 <NA> 0.26550866 0.6870228 0.26722067        NA
  #2   2 <NA> 0.37212390 0.3841037 0.38611409        NA
  #3   3    a 0.57285336 0.7698414 0.01339033 0.5728534
  #4   4    a 0.90820779 0.4976992 0.38238796 0.9082078
  #5   5    b 0.20168193 0.7176185 0.86969085 0.7176185
  #6   6 <NA> 0.89838968 0.9919061 0.34034900        NA
  #7   7 <NA> 0.94467527 0.3800352 0.48208012        NA
  #8   8    b 0.66079779 0.7774452 0.59956583 0.7774452
  #9   9    c 0.62911404 0.9347052 0.49354131 0.4935413
  #10 10    c 0.06178627 0.2121425 0.18621760 0.1862176
  #11 11    b 0.20597457 0.6516738 0.82737332 0.6516738
  #12 12    a 0.17655675 0.1255551 0.66846674 0.1765568
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I was told the problem with my code is that I needed to put indexing on either side. Without the indexing on the right side, it does not know which row to apply the value from. So the correct code in this case would be:

df$new[df$date=='a' & !is.na(df$date)] <- df$va[df$date=='a' & !is.na(df$date)]
df$new[df$date=='b' & !is.na(df$date)] <- df$vb[df$date=='b' & !is.na(df$date)]
df$new[df$date=='c' & !is.na(df$date)] <- df$vc[df$date=='c' & !is.na(df$date)]

Alternatively, another user noted there is a way to use ifelse, which can be viewed as the correct answer here: https://stats.stackexchange.com/questions/151345/how-to-create-a-new-variable-with-values-from-different-variables-if-another-var

As I added to his answer at that link, I found what worked better was to replace the == with %in%, so that it created a numeric variable instead of a list with a row for each of the 36121 observations in my dataset (12 in the example I provided). That would look like:

df$new[df$date %in% 'a' & !is.na(df$date)] <- df$va[df$date %in% 'a' & !is.na(df$date)]
df$new[df$date %in% 'b' & !is.na(df$date)] <- df$vb[df$date %in% 'b' & !is.na(df$date)]
df$new[df$date %in% 'c' & !is.na(df$date)] <- df$vc[df$date %in% 'c' & !is.na(df$date)]
Community
  • 1
  • 1