0

I try to fix dates (years) using a function

change_century <- function(x){
  a <- year(x)
  ifelse(test = a >2020,yes = year(x) <- (year(x)-100),no = year(x) <- a)
  return(x)
}

The function works for specific row or using a loop for one column (here date of birth)

for (i in c(1:nrow(Df))){
  Df_recode$DOB[i] <- change_century(Df$DOB[i])
}

Then I try to use mutate/across

Df_recode <- Df %>% mutate(across(list_variable_date,~change_century(.)))

It does not work. Is there something I am getting wrong? thank you !

Liem Binh
  • 1
  • 1
  • 1
  • 1
    There is a syntax issue in `fielse` as well. in addition you are assigning the `ifelse` output within ifelse which is not correct. Instead you may do `i1 <- a > 2020; year(x[i1]) <- year(x[i1]) - 100; year(x[!i1]) <- year(x[!i1]) <- a; return(x)` – akrun May 05 '22 at 17:52
  • It would help if you can clarify what you have and what you're trying to do, so that we can suggest simpler approaches. Could you share `dput(head(Df))` and `list_variable_date` so that we can understand what kind of data you have? Are they years or dates? Is the intention to recode values from 2021+ to be 100 years earlier? – Jon Spring May 05 '22 at 18:42

1 Answers1

3

Try:

change_century <- function(x){
  a <- year(x)
  newx <- ifelse(test = a > 2020, yes = a - 100, no = a)
  return(newx)
}

(Frankly, the use of newx as a temporary storage and then returning it was done that way solely to introduce minimal changes in your code. In general, in this case one does not need return, in fact theoretically it adds an unnecessary function to the evaluation stack. I would tend to have two lines in that function: a <- year(x) and ifelse(..), without assignment. The default behavior in R is to return the value of the last expression, which in my case would be the results of ifelse, which is what we want. Assigning it to newx and then return(newx) or even just newx as the last expression has exactly the same effect.)

Rationale

ifelse cannot have variable assignment within it. That's not to say that is is a syntax error (it is not), but that it is counter to its intent. You are asking the function to go through each condition found in test=, and return a value based on it. Regardless of the condition, both yes= and no= are evaluated completely, and then ifelse joins them together as needed.

For demonstration,

ifelse(test = c(TRUE, FALSE, TRUE), yes = 1:3, no = 11:13)

The return value is something like:

c(
  if (test[1]) yes[1] else no[1],
  if (test[2]) yes[2] else no[2],
  if (test[3]) yes[3] else no[3]
)
# c(1, 12, 3)

To capture the results of the zipped-together yeses and nos c(1, 12, 3), one must capture the return value from ifelse itself, not inside of the call to ifelse.

Another point that may be relevant: ifelse(cond, yes, now) is not at all a shortcut for if (cond) { yes } else { no }. Some key differences:

  • in if, the cond must always be exactly length 1, no more, no less.

    In R < 4.2, length 0 returns an error argument is of length zero (see ref), while length 2 or more produces a warning the condition has length > 1 and only the first element will be used (see ref1, ref2).

    In R >= 4.2, both conditions (should) produce an error (no warnings).

  • ifelse is intended to be vectorized, so the cond can be any length. yes= and no= should either be the same length or length 1 (recycling is in effect here); cond= should really be the same length as the longer of yes= and no=.

  • if does short-circuiting, meaning that if (TRUE || stop("quux")) 1 will never attempt to evaluate stop. This can be very useful when one condition will fail (logically or with a literal error) if attempted on a NULL object, such as if (!is.null(quux) && quux > 5) ....

    Conversely, ifelse always evaluates all three of cond=, yes=, and no=, and all values in each, there is no short-circuiting.

r2evans
  • 141,215
  • 6
  • 77
  • 149