0

Apologies in advance for a probably very basic question from a fresh R-user: I have two different numeric variables for age (age1, age2), they both contain some missing values. I want to create one variable that contains all the values (all the non-NA-values) of the two other variables, but struggle to get this done. As an example, I take these two columns

age1 <- c(NA, 21, 22, NA, 24, 25, NA, NA)
age2 <- c(20, NA, NA, 23, NA, NA, 26, NA)
dt <- data.frame(age1, age2)

I struggle to merge them into one column, trying to get a result like this:

dt$age <- c(20, 21, 22, 23, 24, 25, 26, NA)

I have tried things like:

dt$age[complete.cases(dt$age1)] <- dt$age1
dt$age[complete.cases(dt$age2)] <- dt$age2

or

dt$age[dt$age1>0] <- dt$age1
dt$age[dt$age2>0] <- dt$age2

or

dt$age[na.omit(dt$age1)] <- dt$age1
dt$age[na.omit(dt$age2)] <- dt$age2

But get error messages such as:

  • number of items to replace is not a multiple of replacement length
  • NAs are not allowed in subscripted assignments

I feel there should be some easy way to do this, but struggle to find out of it. Any hints to a solution is greatly appreciated.

Greetings, Bjorn

1 Answers1

0

Using base R you could do:

dt$age <- ifelse(is.na(dt$age1), dt$age2, dt$age1)

Or, using dplyr:

coalesce(age1, age2)

This gives us the vector:

[1] 20 21 22 23 24 25 26 NA

If you want to create an age variable in your dt dataframe:

dt %>% 
  mutate(age = coalesce(age1, age2))

Which gives us:

  age1 age2 age
1   NA   20  20
2   21   NA  21
3   22   NA  22
4   NA   23  23
5   24   NA  24
6   25   NA  25
7   NA   26  26
8   NA   NA  NA
Matt
  • 7,255
  • 2
  • 12
  • 34