0

I have 3 columns of data that I'd like to use to populate column D (example below of end result, currently column D is blank). For each row, columns A-C will have 1 populated value and 2 NA values. How can I do this?

    A   B   C   D
1  'a' NA  NA  'a'
2  NA  'b' NA  'b'
3  NA  'b' NA  'b'
4  NA  NA 'c'  'c'
5  NA  NA 'c'  'c'
6  'a' NA  NA  'a'
7  'a' NA  NA  'a'
8  NA  NA 'c'  'c'

I tried the following function and although it didn't error out, it didn't populate my dataset. I'm wondering what I'm doing wrong.

Thanks for your help

pop_D <- function(dataset){
for(i in 1:nrow(dataset)){
    if(!is.na(dataset[i,'A'])){
    dataset[i,'D'] <- dataset[i,'A']
  }else if(!is.na(dataset[i,'B'])){
    dataset[i,'D'] <- dataset[i,'B']
  }else{
    dataset[i,'D'] <- dataset[i,'C']
  }
 }
}
pop_D(ds)
Aramis7d
  • 2,444
  • 19
  • 25
LukeT
  • 15
  • 2

3 Answers3

1

We can use pmax to do this

df1$D <- do.call(pmax, c(df1[1:3], na.rm = TRUE))
df1$D
#[1] "a" "b" "b" "c" "c" "a" "a" "c"

Or a second option is applying max.col on the logical matrix for non-NA elements to get the column index, cbind with the row index and extract the elements based on these indexes

df1[1:3][cbind(1:nrow(df1), max.col(!is.na(df1[1:3]), 'first'))]
#[1] "a" "b" "b" "c" "c" "a" "a" "c"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

As a complement to akrun's answer, if you have a matrix (or a dataframe that you convert with as.matrix()) you can just combine all the data and omit all the NA's to create your variable D, i.e.

mt<-matrix(c("a",NA,NA,"a",NA,"b",NA,NA,NA,NA,"c",NA),ncol=3) #create test data

cbind(mt,na.omit(c(as.matrix(mt))))
nadizan
  • 1,323
  • 10
  • 23
0

Considering dft as your input, you can use dplyr and do:

dft %>%
  mutate(D = coalesce(A,B,C))

which gives:

     A    B    C D
1    a <NA> <NA> a
2 <NA>    b <NA> b
3 <NA>    b <NA> b
4 <NA> <NA>    c c
5 <NA> <NA>    c c
6    a <NA> <NA> a
7    a <NA> <NA> a
8 <NA> <NA>    c c

p.s. I prepared the sample input data by copying from the question as:

dft <- read.table(header = TRUE, text = "id    A   B   C   D
1  'a' NA  NA  'a'
2  NA  'b' NA  'b'
3  NA  'b' NA  'b'
4  NA  NA 'c'  'c'
5  NA  NA 'c'  'c'
6  'a' NA  NA  'a'
7  'a' NA  NA  'a'
8  NA  NA 'c'  'c'",stringsAsFactors=FALSE)

dft$id<- NULL
dft$D <- NULL
Aramis7d
  • 2,444
  • 19
  • 25
  • This worked great. Thanks. One issue I found using this code is that I had failed to set the dataframe stringsAsFactors = FALSE. Once I did it ran great. – LukeT Feb 03 '17 at 10:23
  • Awesome. Though it's fun to write your own function, most of the stuff is already out there ;) . In case it helped, please have a look [here](http://stackoverflow.com/help/someone-answers) – Aramis7d Feb 03 '17 at 11:24