5

I have a time series data set which has some missing values in it. I wish to impute the missing values but I am unsure as to which method is most appropriate e.g linear, spline or stine from the imputeTS package.

For the sake of completeness I wish to test whether my data is MCAR, MAR, NMAR. I've a fair idea it's MCAR but I'm interested to do the test.

str(wideRawDF)
'data.frame':   1343 obs. of  13 variables:
 $ Period.Start.Time: POSIXct, format: "2017-01-20 16:30:00" "2017-01-20 16:45:00" "2017-01-20 17:00:00" "2017-01-20 17:15:00" ...
 $ DO0182U09A3      : num  -102 -101 -101 -101 -101 ...
 $ DO0182U09B3      : num  -103.4 -102.8 -103.3 -95.9 -103 ...
 $ DO0182U09C3      : num  -103.9 -104.2 -103.9 -99.2 -104.1 ...
 $ DO0182U21A1      : num  -105 -105 -105 -104 -102 ...
 $ DO0182U21A2      : num  -105 -104 -105 -105 -105 ...
 $ DO0182U21A3      : num  -105 -105 -105 -105 -105 ...
 $ DO0182U21B1      : num  -102 -103 -104 -104 -104 ...
 $ DO0182U21B2      : num  -99.4 -102 -104 -101.4 -104.1 ...
 $ DO0182U21B3      : num  -104 -104 -104 -104 -104 ...
 $ DO0182U21C1      : num  -105 -105 -105 -104 -105 ...
 $ DO0182U21C2      : num  -104 -105 -105 -103 -105 ...
 $ DO0182U21C3      : num  -105 -105 -105 -105 -105 ...

md.pattern(wideRawDF)
     Period.Start.Time DO0182U21C1 DO0182U21C2 DO0182U21C3 DO0182U21B1 DO0182U21B2 DO0182U21B3 DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2
1327                 1           1           1           1           1           1           1           1           1           1           1           1
   3                 1           1           1           1           1           1           1           0           1           1           1           1
   1                 1           1           1           1           1           1           1           1           0           1           1           1
   2                 1           1           1           1           1           1           1           1           1           0           1           1
   1                 1           1           1           1           1           1           1           0           1           1           0           0
   1                 1           1           1           1           1           1           1           0           0           1           0           0
   3                 1           1           1           1           1           1           1           1           0           0           0           0
   2                 1           1           1           1           1           1           1           0           0           0           0           0
   3                 1           1           1           1           0           0           0           1           0           0           0           0
                     0           0           0           0           3           3           3           7          10          10          10          10
     DO0182U21A3   
1327           1  0
   3           1  1
   1           1  1
   2           1  1
   1           0  4
   1           0  5
   3           0  5
   2           0  6
   3           0  8
              10 66

As you can see, some of the columns in my DF do not have NA values. I wish to pass only the columns which have NA to the TestMCARNormality function in the MissMech package.

I have tried the following but I keep getting the same error:

> TestMCARNormality(wideRawDF[,3:4])
Warning: 8 Cases with all variables missing have been removed 

          from the data.
Warning: More than one missing data pattern should be present. 

Using colnames I get the index of the columns which i reference to the above output of md.pattern to be certain that I am using columns with NA values.

> colnames(wideRawDF)
 [1] "Period.Start.Time" "DO0182U09A3"       "DO0182U09B3"       "DO0182U09C3"       "DO0182U21A1"       "DO0182U21A2"       "DO0182U21A3"       "DO0182U21B1"      
 [9] "DO0182U21B2"       "DO0182U21B3"       "DO0182U21C1"       "DO0182U21C2"       "DO0182U21C3"

What is the smart way to test for missing values and pass only the columns with NAs to the TestMCARNormality function?

www
  • 38,575
  • 12
  • 48
  • 84
TheGoat
  • 2,587
  • 3
  • 25
  • 58

2 Answers2

3

As per comment, you can use the following:

has_na <- sapply(wideRawDF, function(x) any(is.na(x)))
TestMCARNormality(wideRawDF[has_na])

has_na is a boolean vector corresponding to each column of wideRawDF. It will be TRUE for any column that has at least one missing value in it.

Therefore, wideRawDF[has_na] is your data frame wideRawDF, but only the columns that have a missing value.

Simon Jackson
  • 3,134
  • 15
  • 24
  • I tried the code above and got an error when I passed my has_na to TestMCARNormality. After some digging I found that TestMCARNormality requires a matrix or data frame consisting of at least two columns so passing a Boolean vector won't work. I tried passing a subset of wideRawDF (wideRawDF[, 2:7]) and it spat back `Warning: 2 Cases with all variables missing have been removed from the data. Warning: More than one missing data pattern should be present.` If you have any ideas I would love to hear them, thanks. – TheGoat Feb 08 '17 at 21:23
  • @ Simon Jackson, don't worry about it, after yet more digging I came across a SO [post](http://stackoverflow.com/questions/20364450/find-names-of-columns-which-contain-missing-values) about how to find columns with NA. Thanks once again for your help. – TheGoat Feb 08 '17 at 22:09
0

Turns out the problem is with the default setting in TestMCARNormality regarding the number of cases a missing pattern must have in order to include it in the analysis. The option in question is "del.lesscases", which by default is set to 6. That means it will drop any missing data pattern than has 6 or fewer cases. Other than the first missing pattern in your data, which contains complete data, every pattern has no more than 3 cases, all of which are dropped by default. Thus, TestMCARNormality gives you the error that you need more than 1 missing data pattern, which is correct. If you set del.lesscases = 2 then it will keep all missing patterns with at least 3 cases, and if set to del.lesscases = 1 it will keep all patterns with at least 2 cases.

BBJonz
  • 53
  • 5