0

I have dataset1 with some missing values in each of the columns and dataset2 (same dimensions) but has the missing values imputed.

I want to subset the imputed values from dataset2 that were NA in dataset1. I don't have an NA flag in my original data.

I am using R Studio:

#Example
data.org <- as.data.frame( cbind(WT=c(NA,20,55,NA,25), HT= c(55,NA,NA,25,30), CBC=c(NA,10,20,NA,50) ) )
data.imp <- as.data.frame( cbind(WT=c(10,20,55,25,25), HT= c(55,30,55,25,30), CBC=c(15,10,20,40,50) ) )

#output
data.imp.WT <- as.data.frame(cbind(WT=c(10,25)))
data.imp.HT <- as.data.frame(cbind(HT=c(30,55)))
data.imp.CBC <- as.data.frame(cbind(CBC=c(15,40)))
Amer
  • 2,131
  • 3
  • 23
  • 38

1 Answers1

1

The following gives the missing values :

data.imp[is.na(data.org)]
#[1] 10 25 30 55 15 40

To have these values column-wise we can use Map :

Map(function(x, y) y[is.na(x)], data.org, data.imp)

#$WT
#[1] 10 25

#$HT
#[1] 30 55

#$CBC
#[1] 15 40
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I need to put in a data frame structure for each varaible – Amer Sep 29 '20 at 03:34
  • If you put the output from the above in `data`, you could use `stack(data)` to get the output in one dataframe Or `list2env(data, .GlobalEnv)` if you want them in separate dataframes. – Ronak Shah Sep 29 '20 at 03:37
  • I want them in one data frame. Stack(data) worked for this example but i got the following error when i applied it on my actual data I am woking with: `in stack.default(Map(function(x, y) y[is.na(x)], data.org, data.imp): non-vector elements will be ignored ` and got 0 obervation of 2 variables – Amer Sep 29 '20 at 03:44
  • This `Map(function(x, y) y[is.na(x)], data.org, data.imp)` worked fine but not inside stack – Amer Sep 29 '20 at 03:45
  • any idea of why? – Amer Sep 29 '20 at 03:45
  • Do you have factors in your actual data? Try `stack(Map(function(x, y) as.character(y[is.na(x)]), data.org, data.imp))` – Ronak Shah Sep 29 '20 at 03:51
  • Yes i do have factor to some variables – Amer Sep 29 '20 at 03:52
  • I jused tried `ldply(list, data.frame)` from plyr package. seems to work? – Amer Sep 29 '20 at 03:53
  • this is to convert the list into data.frame – Amer Sep 29 '20 at 03:54
  • So it did work then for your data with the above code? – Ronak Shah Sep 29 '20 at 04:15
  • Yes, thanks very much. Although not in the question, I was wondering how woulod you reshape the data into a wide format in the case above? – Amer Sep 30 '20 at 00:34
  • Since you have only 2-column dataframe, you need to create an ID column. See https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format Let's say your column name is `ind` and `values`. You can do `data %>% group_by(ind) %>% mutate(row = row_number()) %>% pivot_wider(names_from = ind, values_from = values)` – Ronak Shah Sep 30 '20 at 01:00