0

I am trying to run kernel density estimates of the imputed and observed data. However, I don't want to include variables "FOC_2", and "FOC_3" - they are hierarchical and mess up the imputations. The code runs with the full data set. However when I remove the aforementioned variables I get - 'Error in density.default(x = c(NA_real_, NA_real_, NA_real_, NA_real_,:need at least 2 points to select a bandwidth automatically'

Here is a subset of the data:

> dput(diss_data[1:4,])
structure(list(DS_1 = c(5, 10, 1, 10), DS_2 = c(10, 10, 1, NA
), DS_3 = c(5, 10, NA, 10), DS_4 = c(10, 10, 1, 10), DS_5 = c(10, 
8, 2, 9), DS_6 = c(10, 9, 10, 10), DS_7 = c(5, 6, 5, 10), ISR_1 = c(3, 
7, 1, NA), ISR_2 = c(10, 5, 2, NA), ISR_3 = c(7, 8, 1, NA), ISR_4 = c(10, 
8, 1, NA), ISR_5 = c(10, 10, NA, NA), SC_T1 = c(1, 1, 2, 10), 
    SC_T2 = c(1, 1, 1, 10), SC_T3 = c(5, 1, 2, 10), SC_T4 = c(5, 
    8, NA, 10), SC_T5 = c(5, 7, 10, 10), FOC_1 = structure(c(2L, 
    2L, 1L, 2L), .Label = c("1", "2"), class = "factor"), FOC_2 = c(1, 
    1, 1, NA), FOC_3 = c(NA, 1, NA, 10), PS_1 = c(NA, 5, 1, 10
    ), PR_1 = c(1, 1, NA, NA), PR_2 = c(5, 1, NA, 1), PR_3 = c(1, 
    1, 1, 1), PR_4 = c(5, 10, NA, 1), PR_5 = c(1, 1, 10, NA), 
    PR_6 = c(5, 1, 5, 1), PR_7 = c(5, 1, 10, NA), PR_8 = c(5, 
    1, 10, NA), DR_1 = structure(c(2L, 2L, 1L, 2L), .Label = c("1", 
    "2"), class = "factor"), IR_1 = structure(c(2L, 2L, 1L, 2L
    ), .Label = c("1", "2"), class = "factor"), PF_1 = c(5, 1, 
    10, 10), PF_2 = c(5, 1, 10, 10), PF_3 = c(1, 1, 9, 10), PF_4 = c(10, 
    7, 2, 10), PF_5 = c(10, 10, 6, 10), DF_1 = c(5, 10, 1, NA
    ), DF__2 = c(5, 8, 10, 10), L_1 = c(5, 5, 8, 10), L_2 = c(5, 
    6, 10, 10), PE_1 = c(NA, 10, 5, 10), PE_2 = c(NA, 8, 10, 
    10), PE_3 = c(NA, 9, 10, 10), PE_4 = c(NA, 10, 10, 10), PE_5 = c(10, 
    8, 9, 10), PE_6 = c(10, 10, 10, 10), PE_7 = c(1, 10, 10, 
    10), YRS_N = c(15, 20, 10, NA), AGE = c(22, 60, 53, 24), 
    GENDER = c(2, 1, 1, NA), M_S = c(2, 1, 1, NA), RACE = c(5, 
    2, 2, 5), HAITI = structure(c(1L, 1L, 1L, 1L), .Label = c("1", 
    "2"), class = "factor"), H_INC = c(1, 1, 3, 1), H_O = c(2, 
    2, 1, NA), TREATMENT = structure(c(1L, 1L, 1L, 1L), .Label = c("0", 
    "1"), class = "factor")), row.names = c(NA, 4L), class = "data.frame")

Here is my code:

library(mice)
init = mice(diss_data, maxit=0) 
meth = init$method
predM = init$predictorMatrix

meth[c("FOC_2", "FOC_3")]="" 

imp <-mice(diss_data, method = meth, maxit = 10, m = 10)

densityplot(imp, layout = c(3, 6))


Error in density.default(x = c(NA_real_, NA_real_, NA_real_,
NA_real_,:need at least 2 points to select a bandwidth automatically

I read this "The relevant error message is: Error in density.default: ... need at least 2 points to select a bandwidth automatically. There is yet no workaround for this problem. Use the more robust bwplot or stripplot as a replacement" via https://rdrr.io/cran/mice/man/densityplot.mids.html.

Is it safe to say I cannot use densityplot and should use bwplot or stripplot? Or is there actually a workaround? Please bear with me I am new to R and thank you in advance for any assistance.

Meerten
  • 27
  • 5
Rgrasshopper
  • 13
  • 1
  • 4
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Where is `diss_data` defined? – MrFlick Jul 04 '20 at 19:15
  • MrFlick I followed the link you attached and have added a subset via dput(). Thank you for directing me to the reproducible topic. I am new to stack overflow. Is what I provided sufficient? Thank you – Rgrasshopper Jul 04 '20 at 22:56

0 Answers0