0

I am trying to replace NA values in multiple columns from dataframe x1 by the average of the values from dataframes x2 and x3, based on common and distinct atrribute 'ID'.

All the dataframes(each dataframe is for a particular year) have the same column structure:

ID   A   B   C   .....

01   2   5   7   .....
02   NA  NA  NA  .....
03   5   4   8   .....

I have found an answer to do it for 1 column at a time, thanks to this post.

x1$A[is.na(x1$A)] <- (x2$A[match(x1$ID[is.na(x1$A)],x2$ID)] + x3$A[match(x1$ID[is.na(x1$A)],x3$ID)])/2

But since I have about a 100 coulmns to apply this for I would really like to have a smarter way to do it. I tried the suggestions from this post and also from here. I came up with this code, but couldn't make it work.

x1[6:105] = as.data.frame(lapply(x1[6:105], function(x) ifelse(is.na(x),  (x2$x[match(x1$ID, x2$ID)]+x3$x[match(x1$ID, x3$ID)])/2, x1$x)))

Got the following error:

Error in ifelse(is.na(x), (x2$x[match(x1$ID, x2$ID)] + x3$x[match(x1$ID,  :   replacement has length zero 

I initially thought function(x) worked on the entire column and x represented the column name, but i think it represents each individual cell value and that is why it wont work.

I am a novice in R, I would surely appreciate some guidance to let me know where I am going wrong, in applying the logic to multiple columns.

Community
  • 1
  • 1
dinesh
  • 19
  • 3
  • This is probably not causing the error, but the last argument of `ifelse` should be x rather than `x1$x`. – lmo Feb 12 '17 at 19:26

1 Answers1

0
for (i in 1:ncol(x1)) {
    nas <- is.na(x1[,i]) # where are NAs
    if (sum(nas)==0) next
    ids <- x1$ID[nas] # ids of NAs
    nam <- colnames(x1)[i] # colname of the column
    x1[nas, i] <- (x2[match(ids, x2$zip), nam] + x3[match(ids, x3$zip), nam]) / 2
}
user31264
  • 6,557
  • 3
  • 26
  • 40