Subset values that are not in the other dataset

Question

I have dataset1 with some missing values in each of the columns and dataset2 (same dimensions) but has the missing values imputed.

I want to subset the imputed values from dataset2 that were NA in dataset1. I don't have an NA flag in my original data.

I am using R Studio:

#Example
data.org <- as.data.frame( cbind(WT=c(NA,20,55,NA,25), HT= c(55,NA,NA,25,30), CBC=c(NA,10,20,NA,50) ) )
data.imp <- as.data.frame( cbind(WT=c(10,20,55,25,25), HT= c(55,30,55,25,30), CBC=c(15,10,20,40,50) ) )

#output
data.imp.WT <- as.data.frame(cbind(WT=c(10,25)))
data.imp.HT <- as.data.frame(cbind(HT=c(30,55)))
data.imp.CBC <- as.data.frame(cbind(CBC=c(15,40)))

score 1 · Accepted Answer · answered Sep 29 '20 at 02:57

1

The following gives the missing values :

data.imp[is.na(data.org)]
#[1] 10 25 30 55 15 40

To have these values column-wise we can use Map :

Map(function(x, y) y[is.na(x)], data.org, data.imp)

#$WT
#[1] 10 25

#$HT
#[1] 30 55

#$CBC
#[1] 15 40

answered Sep 29 '20 at 02:57

Ronak Shah

377,200
20
156
213

I need to put in a data frame structure for each varaible – Amer Sep 29 '20 at 03:34
If you put the output from the above in `data`, you could use `stack(data)` to get the output in one dataframe Or `list2env(data, .GlobalEnv)` if you want them in separate dataframes. – Ronak Shah Sep 29 '20 at 03:37
I want them in one data frame. Stack(data) worked for this example but i got the following error when i applied it on my actual data I am woking with: `in stack.default(Map(function(x, y) y[is.na(x)], data.org, data.imp): non-vector elements will be ignored ` and got 0 obervation of 2 variables – Amer Sep 29 '20 at 03:44
This `Map(function(x, y) y[is.na(x)], data.org, data.imp)` worked fine but not inside stack – Amer Sep 29 '20 at 03:45
any idea of why? – Amer Sep 29 '20 at 03:45
Do you have factors in your actual data? Try `stack(Map(function(x, y) as.character(y[is.na(x)]), data.org, data.imp))` – Ronak Shah Sep 29 '20 at 03:51
Yes i do have factor to some variables – Amer Sep 29 '20 at 03:52
I jused tried `ldply(list, data.frame)` from plyr package. seems to work? – Amer Sep 29 '20 at 03:53
this is to convert the list into data.frame – Amer Sep 29 '20 at 03:54
So it did work then for your data with the above code? – Ronak Shah Sep 29 '20 at 04:15
Yes, thanks very much. Although not in the question, I was wondering how woulod you reshape the data into a wide format in the case above? – Amer Sep 30 '20 at 00:34
Since you have only 2-column dataframe, you need to create an ID column. See https://stackoverflow.com/questions/11322801/transpose-reshape-dataframe-without-timevar-from-long-to-wide-format Let's say your column name is `ind` and `values`. You can do `data %>% group_by(ind) %>% mutate(row = row_number()) %>% pivot_wider(names_from = ind, values_from = values)` – Ronak Shah Sep 30 '20 at 01:00

Subset values that are not in the other dataset

1 Answers1