missing values for each participant in the study

Question

I am working in r, what I want to di is make a table or a graph that represents for each participant their missing values. i.e. I have 4700+ participants and for each questions there are between 20 -40 missings. I would like to represent the missing in such a way that I can see who are the people that did not answer the questions and possible look if there is a pattern in the missing values. I have done the following:

Count of complete cases in a data frame named 'data'

sum(complete.cases(mydata))

Count of incomplete cases

sum(!complete.cases(mydata$Variable1))

Which cases (row numbers) are incomplete?

which(!complete.cases(mydata$Variable1))

I then got a list of numbers (That I am not quite sure how to interpret,at first I thought these were the patient numbers but then I noticed that this is not the case.)

I also tried making subsets with only the missings, but then I litterly only see how many missings there are but not who the missings are from.

Could somebody help me? Thanks!

Zas

Hello! Make a code reproducible http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Give some data to make a point in your question — Mateusz1981, Apr 14 '16 at 12:46

Kunal Puri · Accepted Answer · 2016-04-14T13:14:46.207

1

If there is a column that can distinguish a row in the data.frame mydata say patient numbers patient_no, then you can easily find out the patient numbers of missing people by:

> mydata <- data.frame(patient_no = 1:5, variable1 = c(NA,NA,1,2,3))

> mydata[!complete.cases(mydata$variable1),'patient_no']

[1] 1 2

If you want to consider the pattern in which the users have missed a particular question, then this might be useful for you:

Assumption: Except Column 1, all other columns represent the columns related to questions.

> lapply(mydata[,-1],function(x){mydata[!complete.cases(x),'patient_no']})

edited Apr 14 '16 at 13:14

answered Apr 14 '16 at 12:50

Kunal Puri

3,419
1
10
22

Thank you for your answer, however it gives me an error saying "incorrect number of dimensions" – Z.Chanell Apr 14 '16 at 12:59
Is mydata a data.frame? – Kunal Puri Apr 14 '16 at 13:01
mydata, is the data file where the values are in. – Z.Chanell Apr 14 '16 at 13:04
So, is it mydata.csv? or mydata stores the file name? – Kunal Puri Apr 14 '16 at 13:05
its an .sav file (I imported it from spss) – Z.Chanell Apr 14 '16 at 13:06
So, you need to first import the data using `read.spss()` function, convert to data.frame and then do the job. – Kunal Puri Apr 14 '16 at 13:08
You can convert the data to data.frame by setting parameter `to.data.frame` to `T` in `read.spss()` function. – Kunal Puri Apr 14 '16 at 13:17
now I get a message "unable to open file: 'No such file or directory" O tried downloading teh memsic package and chagning it using that but I still get the same message. – Z.Chanell Apr 14 '16 at 13:19
Please make sure that the file lies somewhere in the workspace. Or specify the complete path of the file. – Kunal Puri Apr 14 '16 at 13:21
I had to specify the complete path – Z.Chanell Apr 14 '16 at 13:24
Thank you so much Kunal! – Z.Chanell Apr 14 '16 at 13:36

score 0 · Answer 2 · answered May 23 '17 at 10:56

Remember that R automatically attach numbers to the observations in your data set. For example if your data has 20 observations (20 rows), R attaches numbers from 1 to 20, which is actually not part of your original data. They are the row numbers. The results produced by the R code: which(!complete.cases(mydata$Variable1)) correspond to those numbers. The numbers are the rows of your data set that has at least one missing data (column).

missing values for each participant in the study

Count of complete cases in a data frame named 'data'

Count of incomplete cases

Which cases (row numbers) are incomplete?

2 Answers2