0

I have a vector called classes that is the output of an analysis that used listwise deletion. As a result, the cases included in classes is a subset of the entire dataset -- some cases were dropped because of incomplete data.

Selection is a dummy variable that occurs with every case in my dataset. A shortened example of my data is below. There is also a unique case ID for every observation.

classes <- c(1,2,1,1,1,2,3,3,3,1,1,1,3,3,2,2,2)
selection <- c(1,0,0,0,1,1,1,1,0,0,0,0,0,1,1,1,1,0,0,0,1,1,1,0,1,0)
case <-seq(1,26,1)

I would like to create a new version of selection (say, selection2) so that it only includes cases that are in classes. Basically, I would like both variables to be the same length for comparison purposes, where the cases that are NOT included in classes are also not included in selection2.

I thought this would be an easy fix, but I've spend a lot of time getting nowhere, so I thought I'd ask. Thanks in advance!

Captain Murphy
  • 855
  • 3
  • 15
  • 23
  • 3
    How do you know which cases ended up in `classes`? If you store these then you can use it to index `selection`. – mathematical.coffee May 16 '12 at 23:52
  • Thanks for the comment. That makes sense, but I do not know which are included using str(). my output comes from the poLCA function in the poLCA package. Do you know of a way to store the cases? – Captain Murphy May 17 '12 at 00:03
  • 1
    @CaptainMurphy, it's much easier for us to help you if you post a small reproducible example. You can take a look at [this question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Eric Fail May 17 '12 at 00:42

1 Answers1

0

If they are to be the same length, then the reduced version must have NA's:

> selection2 <- selection
> is.na(selection2) <- !selection2 %in% classes
> selection2
 [1]  1 NA NA NA  1  1  1  1 NA NA NA NA NA  1  1  1  1 NA NA NA  1  1  1 NA  1 NA
IRTFM
  • 258,963
  • 21
  • 364
  • 487