Deleting all rows of a choice set where no alternative is chosen in R

Question

I am struggling with some code that I need for data management. I apologise in advance because I am sure it has a quite simple solution, but I could not find any information elsewhere.

I am analysing data in long format using the mlogit command in R. For each choice set, one alternative should be chosen; otherwise the mlogit command fails with the following error:

Error in if (abs(x - oldx) < ftol) { : 
missing value where TRUE/FALSE needed

For my dataset, there are indeed some choice sets where no alternative is chosen. My question is therefore: How can I delete the all rows of a choice set where no alternative is chosen? In this example, I wish to delete all rows for ID 2, since no choice is made by this respondent:

i.e., the value of the choice variable is always "FALSE".

Any help much appreciated!

Please provide problem in minimal reproducible form, i.e. so that anyone else can *easily* copy it from your post & paste it into their session & see the results. All library statements & inputs must be provided & if large they need to be cut down to the minimal size that will still illustrate problem. Post output of `dput(whatever)` (NOT as images) to show input data reproducibly. For info on how to pose a question see 1) http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example 2) http://stackoverflow.com/help/mcve 3) http://stackoverflow.com/help/how-to-ask — G. Grothendieck, Dec 24 '15 at 02:22

akrun · Accepted Answer · 2015-12-24T09:15:59.863

3

One approach with data.table (using @Richo's df). We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'ID', we get the Subset of Data.table (.SD).

library(data.table)
setDT(df)[, if(any(CHOICE)) .SD, by = ID]
#    ID CHOICE   ALT
#1:  1  FALSE TRAIN
#2:  1   TRUE   CAR
#3:  1  FALSE   BUS
#4:  3   TRUE TRAIN
#5:  3  FALSE   CAR
#6:  3  FALSE   BUS
#7:  3  FALSE  BIKE

Or as @docendodiscimus mentioned

setDT(df)[, .SD[any(CHOICE)], by = ID]

A faster option might be to use .I to get the row index and then extract the rows

setDT(df)[df[, .I[any(CHOICE)], by = ID]$V1]

edited Dec 24 '15 at 09:15

answered Dec 24 '15 at 05:40

akrun

874,273
37
540
662

1

Or `setDT(df)[, .SD[any(CHOICE)], by = ID]` – talat Dec 24 '15 at 09:13

score 2 · Answer 2 · edited Dec 24 '15 at 09:12

Here you go:

library(dplyr)
df <- df %>% group_by(ID) %>% mutate(sum = sum(CHOICE))
df <- df[df$sum != 0 , ]

Or, using dplyr's filter function:

df %>% group_by(ID) %>% filter(any(CHOICE))

data:

df <- data.frame(ID = c(1,1,1,2,2,2,3,3,3,3),
                 CHOICE = c(F,T,F,F,F,F,T,F,F,F),
                 ALT = c("TRAIN", "CAR", "BUS","TRAIN", "CAR", "BUS","TRAIN", "CAR", "BUS","BIKE"))

score 2 · Answer 3 · answered Dec 24 '15 at 04:00

2

Use ave in combination with any (borrowing @Richo's df):

df[ave(df$CHOICE, df$ID, FUN=any),]
#   ID CHOICE   ALT
#1   1  FALSE TRAIN
#2   1   TRUE   CAR
#3   1  FALSE   BUS
#7   3   TRUE TRAIN
#8   3  FALSE   CAR
#9   3  FALSE   BUS
#10  3  FALSE  BIKE

answered Dec 24 '15 at 04:00

thelatemail

91,185
12
128
188

Deleting all rows of a choice set where no alternative is chosen in R

3 Answers3