1

I have 2 columns in my dataframe such that for every row either of the following is true: 1. only one of them has a value (and the other is missing) 2. both have missing value

I need to create a new column which has missing value if both columns are missing or the non missing value if only 1 column has a value.

Example: A and B are columns in my dataset, C is the column I need to create:

A   B   C
1   NA  1
NA  6   6
NA  NA  NA
9   NA  9
4   NA  4
NA  3   3

Further, there are many such sets of columns that I wish to combine, so ideally I would like to do this with the help of a function where I can pass the names of new column, column1, column2 and the dataframe.

I am not able to understand how do I go about it. Can anyone please point me in the right direction.

(I searched for duplicate questions, but the ones I found were related to non missing values so I had to post this similar question)

2 Answers2

1

What about:

fc <- function(data, column1, column2) {
  data$res <- rowSums(data[,c(column1, column2)], na.rm=T)
  data[,data$res==0] <- NA
  return(data)
}

or if your data aren't numeric and you prefer @r2evans solution:

fc <- function(data, column1, column2) {
  data$res <- ifelse(is.na(data[,column1]), data[,column2], data[,column1])
  return(data)
}

than you call:

df2 <- fc(df, "A", "B")
Bastien
  • 3,007
  • 20
  • 38
  • Thank you @Bastien. It worked perfectly.I tried to upvote your answer but upvotes from people with less than 15 reputation score (like me) aren't displayed publicly. – Neerav Makwana Dec 11 '17 at 17:27
0

Another approach could be

myFun <- function(df, col1, col2, new_col){
  df[new_col] <- pmax(df[,col1], df[,col2], na.rm = TRUE)
  return(df)
}

myFun(df, "A", "B", "C")

#   A  B  C
#1  1 NA  1
#2 NA  6  6
#3 NA NA NA
#4  9 NA  9
#5  4 NA  4
#6 NA  3  3


#sample data
> dput(df)
structure(list(A = c(1L, NA, NA, 9L, 4L, NA), B = c(NA, 6L, NA, 
NA, NA, 3L)), .Names = c("A", "B"), class = "data.frame", row.names = c(NA, 
-6L))
Prem
  • 11,775
  • 1
  • 19
  • 33