in my 4 wave longitudinal study on commuting, I want to exclude participants who indicated a commute duration shorter than 1 min and longer than 180 min. The code I am using now seems to work fine at first, but though it excluded only around 200 participants from my original 1500, my n is only half of size and my NAs are double. What happend?
I wont select each row to exclude it; this is too much work for 200 cases. I already tried filter() but this didnt work.
describe(selected.data$T2_cvar1) # mean 65.58 SD 127.84 NAs 518 n 1004
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1004 65.58 127.84 25 33.89 22.24 0 999 999 3.97 17.2
## se
## X1 4.03
selected.data$T2_cvar1_select <- ifelse(is.na(selected.data$T2_cvar1) == TRUE, selected.data$T2_cvar1, ifelse(selected.data$T2_cvar1 > 0 & selected.data$T2_cvar1 < 181, selected.data$T2_cvar1, -999))
selected.data<- selected.data[selected.data$T2_cvar1_select != -999, ]
# 118 partcipants excluded, 1406 remain
#(got this information with dim command)
describe(selected.data$T2_cvar1) NAs 848
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 476 33.66 28.94 25 28.59 14.83 1 180 179 2.04 4.83
## se
## X1 1.33
I repeat this for the other three wayes (T3,T4,T5) dim command says that there are still 1322 participants
Why did my NAs grow? And how can I exclude these people without giving away half of my data?