1

I am struggling with some code that I need for data management. I apologise in advance because I am sure it has a quite simple solution, but I could not find any information elsewhere.

I am analysing data in long format using the mlogit command in R. For each choice set, one alternative should be chosen; otherwise the mlogit command fails with the following error:

Error in if (abs(x - oldx) < ftol) { : 
missing value where TRUE/FALSE needed

For my dataset, there are indeed some choice sets where no alternative is chosen. My question is therefore: How can I delete the all rows of a choice set where no alternative is chosen? In this example, I wish to delete all rows for ID 2, since no choice is made by this respondent:

enter image description here

i.e., the value of the choice variable is always "FALSE".

Any help much appreciated!

thelatemail
  • 91,185
  • 12
  • 128
  • 188
James
  • 35
  • 4
  • 2
    Please provide problem in minimal reproducible form, i.e. so that anyone else can *easily* copy it from your post & paste it into their session & see the results. All library statements & inputs must be provided & if large they need to be cut down to the minimal size that will still illustrate problem. Post output of `dput(whatever)` (NOT as images) to show input data reproducibly. For info on how to pose a question see 1) http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example 2) http://stackoverflow.com/help/mcve 3) http://stackoverflow.com/help/how-to-ask – G. Grothendieck Dec 24 '15 at 02:22

3 Answers3

3

One approach with data.table (using @Richo's df). We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'ID', we get the Subset of Data.table (.SD).

library(data.table)
setDT(df)[, if(any(CHOICE)) .SD, by = ID]
#    ID CHOICE   ALT
#1:  1  FALSE TRAIN
#2:  1   TRUE   CAR
#3:  1  FALSE   BUS
#4:  3   TRUE TRAIN
#5:  3  FALSE   CAR
#6:  3  FALSE   BUS
#7:  3  FALSE  BIKE

Or as @docendodiscimus mentioned

setDT(df)[, .SD[any(CHOICE)], by = ID]

A faster option might be to use .I to get the row index and then extract the rows

setDT(df)[df[, .I[any(CHOICE)], by = ID]$V1]
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Here you go:

library(dplyr)
df <- df %>% group_by(ID) %>% mutate(sum = sum(CHOICE))
df <- df[df$sum != 0 , ]

Or, using dplyr's filter function:

df %>% group_by(ID) %>% filter(any(CHOICE))

data:

df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,3),
                 CHOICE = c(F,T,F,F,F,F,T,F,F,F),
                 ALT = c("TRAIN", "CAR", "BUS","TRAIN", "CAR", "BUS","TRAIN", "CAR", "BUS","BIKE"))
talat
  • 68,970
  • 21
  • 126
  • 157
Mist
  • 1,888
  • 1
  • 14
  • 21
2

Use ave in combination with any (borrowing @Richo's df):

df[ave(df$CHOICE, df$ID, FUN=any),]
#   ID CHOICE   ALT
#1   1  FALSE TRAIN
#2   1   TRUE   CAR
#3   1  FALSE   BUS
#7   3   TRUE TRAIN
#8   3  FALSE   CAR
#9   3  FALSE   BUS
#10  3  FALSE  BIKE
thelatemail
  • 91,185
  • 12
  • 128
  • 188