-1

I have 4 date variables and 1 grouping variable. Based on the value in the grouping variable a date needs to be taken into a new variable. My data looks like this: (https://www.dropbox.com/s/wlfv89qc1jzwfgk/vb.JPG)

I want to make a new variable based on the value of group. When group is A I need date1, when group is B I need date2, etc.

This is the code I've used:

if (tabel$group == A) {
  newvar <- tabel$date1
} else if (tabel$group == B) {
  newvar <- tabel$date2
} else if (tabel$group == C) {
  newvar <- tabel$date3
} else if (tabel$group == D) {
  newvar <- tabel$date4
}

This code give me a warning message:

(In if (tabel$group == 1) { : the condition has length > 1 and only the first element will be used

Nitro
  • 1
  • 1
  • [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data (not a picture of it) and all necessary code. You probably want `ifelse` or `dplyr::case_when`, since `if`/`else` structures aren't vectorized. Several SO questions deal with this warning – camille May 28 '19 at 21:25
  • If I execute your code as posted I get the error message *Error: object 'A' not found*. How is variable `A` defined, please? – Uwe May 29 '19 at 06:09

2 Answers2

0

You need to perform this row-wise...one option is to use nested ifelse()

newvar=ifelse(tabel$group=='A',tabel$date1,
              ifelse(tabel$group=='B',tabel$date2,
                     ifelse(tabel$group=='C',tabel$date3,tabel$date4))))
c.custer
  • 407
  • 5
  • 13
0

I am sure this has been asked and answered several times before but I haven't found a good duplicate.

Nested ifelse() aren't always the best solution. It is cumbersome to code, read and maintain with lots of parentheses. Furthermore, ifelse() may change the type of the result unexpectedly, e.g., class Date.

So, I want to post some alternatives

Subsetting using match()

This one uses the specific fact that the first letter "A" corresponds to the first date column date1, etc. So, it may not be applicable in general:

indices <- matrix(c(seq.int(nrow(tabel)), match(tabel$group, LETTERS[1:4])), ncol = 2)
newvar <- tabel[, -1][indices]
newvar
 [1] "2011-01-01" "2011-01-02" "2011-01-03" "2012-02-04" "2012-02-05" "2012-02-06" "2013-03-07" "2013-03-08" "2013-03-09"
[10] "2014-04-10" "2014-04-11" "2014-04-12"

The indices (row number, column number) which are used for subsetting are:

      [,1] [,2]
 [1,]    1    1
 [2,]    2    1
 [3,]    3    1
 [4,]    4    2
 [5,]    5    2
 [6,]    6    2
 [7,]    7    3
 [8,]    8    3
 [9,]    9    3
[10,]   10    4
[11,]   11    4
[12,]   12    4

tabel[, -1] omits the first column (group).

case_when()

This is avectorised version of multiple if_else() statements from the dplyr package.

newvar <- dplyr::case_when(
  tabel$group == "A" ~ tabel$date1,
  tabel$group == "B" ~ tabel$date2,
  tabel$group == "C" ~ tabel$date3,
  tabel$group == "D" ~ tabel$date4)

or

library(dplyr)
newvar <- tabel %>% 
  transmute(value = case_when(group == "A" ~ date1,
                              group == "B" ~ date2,
                              group == "C" ~ date3,
                              group == "D" ~ date4)) %>% 
  pull(value)

Joining with a look-up table using data.table

Look-up tables treat the relation between group and column name as data and thus are easy to create and maintain. On the other hand, tabel has to be reshaped from wide to long format before it can be joined with lookup:

library(data.table)
lookup <- data.table(group = LETTERS[1:4], variable = paste0("date", 1:4))
newvar <- melt(setDT(tabel), id.vars = "group")[lookup, on = .(group, variable)]$value

lookup
   group variable
1:     A    date1
2:     B    date2
3:     C    date3
4:     D    date4

The reshaped long format is

melt(setDT(tabel), id.vars = "group")
    group variable      value
 1:     A    date1 2011-01-01
 2:     A    date1 2011-01-02
 3:     A    date1 2011-01-03
 4:     B    date1 2011-01-04
 5:     B    date1 2011-01-05
 6:     B    date1 2011-01-06
 7:     C    date1 2011-01-07
 8:     C    date1 2011-01-08
 9:     C    date1 2011-01-09
10:     D    date1 2011-01-10
11:     D    date1 2011-01-11
12:     D    date1 2011-01-12
13:     A    date2 2012-02-01
14:     A    date2 2012-02-02
15:     A    date2 2012-02-03
16:     B    date2 2012-02-04
17:     B    date2 2012-02-05
18:     B    date2 2012-02-06
19:     C    date2 2012-02-07
20:     C    date2 2012-02-08
21:     C    date2 2012-02-09
22:     D    date2 2012-02-10
23:     D    date2 2012-02-11
24:     D    date2 2012-02-12
25:     A    date3 2013-03-01
26:     A    date3 2013-03-02
27:     A    date3 2013-03-03
28:     B    date3 2013-03-04
29:     B    date3 2013-03-05
30:     B    date3 2013-03-06
31:     C    date3 2013-03-07
32:     C    date3 2013-03-08
33:     C    date3 2013-03-09
34:     D    date3 2013-03-10
35:     D    date3 2013-03-11
36:     D    date3 2013-03-12
37:     A    date4 2014-04-01
38:     A    date4 2014-04-02
39:     A    date4 2014-04-03
40:     B    date4 2014-04-04
41:     B    date4 2014-04-05
42:     B    date4 2014-04-06
43:     C    date4 2014-04-07
44:     C    date4 2014-04-08
45:     C    date4 2014-04-09
46:     D    date4 2014-04-10
47:     D    date4 2014-04-11
48:     D    date4 2014-04-12
    group variable      value

Joining with a look-up table using dplyr and tidyr

library(dplyr)
library(tidyr)
lookup <- tibble(group = LETTERS[1:4], key = paste0("date", 1:4))
newvar <- tabel %>% 
  gather(key, value, -group) %>% 
  inner_join(lookup) %>% 
  pull(value)

This works as above: create look-up table, reshape from wide to long format and join. pull() returns a simple result vector.

Reproducible data

tabel <- data.frame(group = rep(LETTERS[1:4], each = 3L),
                    date1 = as.Date("2011-01-01") + 0:11,
                    date2 = as.Date("2012-02-01") + 0:11,
                    date3 = as.Date("2013-03-01") + 0:11,
                    date4 = as.Date("2014-04-01") + 0:11)
tabel
    group      date1      date2      date3      date4
 1:     A 2011-01-01 2012-02-01 2013-03-01 2014-04-01
 2:     A 2011-01-02 2012-02-02 2013-03-02 2014-04-02
 3:     A 2011-01-03 2012-02-03 2013-03-03 2014-04-03
 4:     B 2011-01-04 2012-02-04 2013-03-04 2014-04-04
 5:     B 2011-01-05 2012-02-05 2013-03-05 2014-04-05
 6:     B 2011-01-06 2012-02-06 2013-03-06 2014-04-06
 7:     C 2011-01-07 2012-02-07 2013-03-07 2014-04-07
 8:     C 2011-01-08 2012-02-08 2013-03-08 2014-04-08
 9:     C 2011-01-09 2012-02-09 2013-03-09 2014-04-09
10:     D 2011-01-10 2012-02-10 2013-03-10 2014-04-10
11:     D 2011-01-11 2012-02-11 2013-03-11 2014-04-11
12:     D 2011-01-12 2012-02-12 2013-03-12 2014-04-12
Uwe
  • 41,420
  • 11
  • 90
  • 134