0

I am working in r, what I want to di is make a table or a graph that represents for each participant their missing values. i.e. I have 4700+ participants and for each questions there are between 20 -40 missings. I would like to represent the missing in such a way that I can see who are the people that did not answer the questions and possible look if there is a pattern in the missing values. I have done the following:

Count of complete cases in a data frame named 'data'

sum(complete.cases(mydata))

Count of incomplete cases

sum(!complete.cases(mydata$Variable1)) 

Which cases (row numbers) are incomplete?

which(!complete.cases(mydata$Variable1))

I then got a list of numbers (That I am not quite sure how to interpret,at first I thought these were the patient numbers but then I noticed that this is not the case.)

I also tried making subsets with only the missings, but then I litterly only see how many missings there are but not who the missings are from.

Could somebody help me? Thanks!

Zas

Mateusz1981
  • 1,817
  • 17
  • 33
Z.Chanell
  • 35
  • 8
  • Hello! Make a code reproducible http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Give some data to make a point in your question – Mateusz1981 Apr 14 '16 at 12:46

2 Answers2

1

If there is a column that can distinguish a row in the data.frame mydata say patient numbers patient_no, then you can easily find out the patient numbers of missing people by:

> mydata <- data.frame(patient_no = 1:5, variable1 = c(NA,NA,1,2,3))

> mydata[!complete.cases(mydata$variable1),'patient_no']

[1] 1 2

If you want to consider the pattern in which the users have missed a particular question, then this might be useful for you:

Assumption: Except Column 1, all other columns represent the columns related to questions.

> lapply(mydata[,-1],function(x){mydata[!complete.cases(x),'patient_no']})
Kunal Puri
  • 3,419
  • 1
  • 10
  • 22
0

Remember that R automatically attach numbers to the observations in your data set. For example if your data has 20 observations (20 rows), R attaches numbers from 1 to 20, which is actually not part of your original data. They are the row numbers. The results produced by the R code: which(!complete.cases(mydata$Variable1)) correspond to those numbers. The numbers are the rows of your data set that has at least one missing data (column).