I have a DF (dfNA) that contains a small amount of missing data in each column. This data frame is a subset of a larger data frame (wideRawDF) for which I want to impute the missing values of.
In order to impute the data, I need to determine whether the data is missing MCAR/NMAR/MAR so that I can apply the right imputation method.
colsNA is a character string of the columns which contain NA values, it was derived as follows:
colsNA <- colnames(wideRawDF)[colSums(is.na(wideRawDF)) > 0]
> str(colsNA)
chr [1:9] "DO0182U09A3" "DO0182U09B3" "DO0182U09C3" "DO0182U21A1" "DO0182U21A2" "DO0182U21A3" "DO0182U21B1" ...
In order to simplify and better understand why I was getting an error with TestMCARNormality I decided to pass just the columns with NA values and not the ones containing complete values.
I subsetted wideRawDF as follows:
dfNA <- wideRawDF[colsNA]
TestMCARNormality
is a function that tests whether the missing data is MCAR.
Using this function I am getting the following error when I pass dfNA to it:
R> library("MissMech")
R> TestMCARNormality(dfNA)
Warning: More than one missing data pattern should be present.
Error in TestMCARNormality(dfNA) :
I can't figure out what the error is referring to since my data frame has missing values within it:
> apply(dfNA, 2, function(x) any(is.na(x)))
DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3 DO0182U21B1 DO0182U21B2 DO0182U21B3
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
My data frame also has numerical data in it:
> str(dfNA)
'data.frame': 1343 obs. of 9 variables:
$ DO0182U09A3: num -102 -101 -101 -101 -101 ...
$ DO0182U09B3: num -103.4 -102.8 -103.3 -95.9 -103 ...
$ DO0182U09C3: num -103.9 -104.2 -103.9 -99.2 -104.1 ...
$ DO0182U21A1: num -105 -105 -105 -104 -102 ...
$ DO0182U21A2: num -105 -104 -105 -105 -105 ...
$ DO0182U21A3: num -105 -105 -105 -105 -105 ...
$ DO0182U21B1: num -102 -103 -104 -104 -104 ...
$ DO0182U21B2: num -99.4 -102 -104 -101.4 -104.1 ...
$ DO0182U21B3: num -104 -104 -104 -104 -104 ...
I've googled the error and found the source code at this page but I am not a strong programmer and struggle to understand it. Any help in getting to the bottom of this would be greatly appreciated.
Below is a dput() output of the files I am using.
wideRawDF This is the original DF with columns containing both missing and complete values
colsNA This is a character string of columns with NA values in them
dfNA is a subset DF of columns with NA values in them