0

in my 4 wave longitudinal study on commuting, I want to exclude participants who indicated a commute duration shorter than 1 min and longer than 180 min. The code I am using now seems to work fine at first, but though it excluded only around 200 participants from my original 1500, my n is only half of size and my NAs are double. What happend?

I wont select each row to exclude it; this is too much work for 200 cases. I already tried filter() but this didnt work.

describe(selected.data$T2_cvar1) # mean 65.58   SD 127.84   NAs 518 n 1004 
##    vars    n  mean     sd median trimmed   mad min max range skew kurtosis
## X1    1 1004 65.58 127.84     25   33.89 22.24   0 999   999 3.97     17.2
##      se
## X1 4.03

selected.data$T2_cvar1_select <- ifelse(is.na(selected.data$T2_cvar1) == TRUE, selected.data$T2_cvar1, ifelse(selected.data$T2_cvar1 > 0 & selected.data$T2_cvar1 < 181, selected.data$T2_cvar1, -999))

selected.data<- selected.data[selected.data$T2_cvar1_select != -999, ]
# 118 partcipants excluded, 1406 remain 
#(got this information with dim command)

describe(selected.data$T2_cvar1) NAs 848

##    vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 476 33.66 28.94     25   28.59 14.83   1 180   179 2.04     4.83
##      se
## X1 1.33

I repeat this for the other three wayes (T3,T4,T5) dim command says that there are still 1322 participants

Why did my NAs grow? And how can I exclude these people without giving away half of my data?

Carolin
  • 23
  • 4
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Oct 25 '19 at 20:20
  • You seem to be keeping `NA` values in `selected.data$T2_cvar1_select`. If you use a vector of that has `NA` values to index rows of a data.frame, you get `NA` values back. You can see this with `iris[c(TRUE,NA,FALSE),]` Note that the total number of rows goes from 150 to 100 because the TRUE and NA values are kept. – MrFlick Oct 25 '19 at 20:24

0 Answers0