1

I have a DF (dfNA) that contains a small amount of missing data in each column. This data frame is a subset of a larger data frame (wideRawDF) for which I want to impute the missing values of.

In order to impute the data, I need to determine whether the data is missing MCAR/NMAR/MAR so that I can apply the right imputation method.

colsNA is a character string of the columns which contain NA values, it was derived as follows:

colsNA <- colnames(wideRawDF)[colSums(is.na(wideRawDF)) > 0]

 > str(colsNA)
 chr [1:9] "DO0182U09A3" "DO0182U09B3" "DO0182U09C3" "DO0182U21A1" "DO0182U21A2" "DO0182U21A3" "DO0182U21B1" ...

In order to simplify and better understand why I was getting an error with TestMCARNormality I decided to pass just the columns with NA values and not the ones containing complete values.

I subsetted wideRawDF as follows:

dfNA <- wideRawDF[colsNA]

TestMCARNormality is a function that tests whether the missing data is MCAR.

Using this function I am getting the following error when I pass dfNA to it:

R> library("MissMech")
R> TestMCARNormality(dfNA)
Warning: More than one missing data pattern should be present.
Error in TestMCARNormality(dfNA) :

I can't figure out what the error is referring to since my data frame has missing values within it:

> apply(dfNA, 2, function(x) any(is.na(x)))
DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3 DO0182U21B1 DO0182U21B2 DO0182U21B3 
       TRUE        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE        TRUE

My data frame also has numerical data in it:

> str(dfNA)
'data.frame':   1343 obs. of  9 variables:
 $ DO0182U09A3: num  -102 -101 -101 -101 -101 ...
 $ DO0182U09B3: num  -103.4 -102.8 -103.3 -95.9 -103 ...
 $ DO0182U09C3: num  -103.9 -104.2 -103.9 -99.2 -104.1 ...
 $ DO0182U21A1: num  -105 -105 -105 -104 -102 ...
 $ DO0182U21A2: num  -105 -104 -105 -105 -105 ...
 $ DO0182U21A3: num  -105 -105 -105 -105 -105 ...
 $ DO0182U21B1: num  -102 -103 -104 -104 -104 ...
 $ DO0182U21B2: num  -99.4 -102 -104 -101.4 -104.1 ...
 $ DO0182U21B3: num  -104 -104 -104 -104 -104 ...

I've googled the error and found the source code at this page but I am not a strong programmer and struggle to understand it. Any help in getting to the bottom of this would be greatly appreciated.

Below is a dput() output of the files I am using.

wideRawDF This is the original DF with columns containing both missing and complete values

colsNA This is a character string of columns with NA values in them

dfNA is a subset DF of columns with NA values in them

TheGoat
  • 2,587
  • 3
  • 25
  • 58
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – alexwhitworth Feb 09 '17 at 02:49
  • @AlexW please find attached to the original post a copy of a dput for both variables, wideRawDF and colsNA. wideRawDF is the original CSV import while colsNA is a list of the columns with NA values. I wish to pass a subset of wideRawDF to TestMCARNormality for testing the NA values. I hope this makes sense. – TheGoat Feb 09 '17 at 08:08
  • No, it doesn't. And it requires **far too much work** from the answerer. Subset your data, or create synthetic data, and use that to illustrate your error... Your link to the source code provides plenty of examples of synthetic data creation. – alexwhitworth Feb 09 '17 at 18:02
  • @AlexW Apologies, I didn't expect you to work it out, I forgot to attach the final subsetted data during my commute to work. I was supposed to upload it once I got into the office but it slipped my mind. – TheGoat Feb 09 '17 at 22:57
  • @TheGoat Hi I know it has been some time, but had you any luck in finding a solution I am having exactly the same issue over here – Björn Jul 24 '20 at 07:56

0 Answers0