How to remove all rows belonging to a particular group when only one row fulfills the condition in R?

Question

Following is my sample data set:

> dput(lanec)
structure(list(vehicle.id = c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L), frame.id = c(1L, 2L, 3L, 4L, 5L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L), lane.change = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 
2L, 1L), .Label = c(".", "yes"), class = "factor")), .Names = c("vehicle.id", 
"frame.id", "lane.change"), class = "data.frame", row.names = c(NA, 
-26L))

The first column is the IDs of vehicles entering on a particular segment of a freeway. They were observed until they left the segment so each vehicle has different number of time frames in which it was observed. The frame numbers are given in frame.id column. The third column tells whether the vehicle changed the lane and in which frame. In this sample data all except vehicle # 2 changed the lane. Vehicle # 5 changed the lane twice.

Required

I want to identify which vehicles changed the lane and remove them from data set. I tried using subset(lanec, lane.change!='yes') but it only remove those rows where the value of lane.change is yes. Using the sample data set, the desired output should be:

vehicle.id frame.id lane.change
1           2        1           .
2           2        2           .
3           2        3           .
4           2        4           .
5           2        5           .

How can I achieve this? It must be simple but I can't figure it out. Thanks in advance.

score 0 · Answer 1 · answered May 14 '14 at 23:46

 steady <- names( which(with(lanec, tapply(lane.change, vehicle.id, function(x) all(x==".")) ) ))
steady
[1] "2"

So pick the onew where all the items in lane.change are "."

lanec[ lanec$vehicle.id %in% steady, ]
#-------

  vehicle.id frame.id lane.change
1          2        1           .
2          2        2           .
3          2        3           .
4          2        4           .
5          2        5           .

flodel · Accepted Answer · 2014-05-15T14:33:44.390

0

You can do:

subset(lanec, ave(lane.change != "yes", vehicle.id, FUN = all))

To help understand what ave returns, maybe you can break it into a couple steps:

lanec <- transform(lanec, stays.in.lane = ave(lane.change != "yes", vehicle.id, FUN = all))
subset(lanec, stays.in.lane)

You will see that ave returns a vector of TRUE/FALSE along lanec: whether the vehicle.id had all (hence the use of all) of its lane.change values not equal to 'yes'.

edited May 15 '14 at 14:33

answered May 15 '14 at 00:02

flodel

87,577
21
185
223

Or `lanec[with(lanec, ave(lane.change != "yes", vehicle.id, FUN = all)),]` to avoid using `subset` – thelatemail May 15 '14 at 00:03
It's a matter of taste, I prefer using `subset` to avoid having to write `lanec` twice. I'm also using one fewer functions than you, it's more readable you could argue. – flodel May 15 '14 at 00:21
@flodel Thank you. It works. But could you please also explain what is happening in background? I looked at `ave` in help and found that it finds the mean by grouping variable. Two things are confusing me here: 1) I see that you replaced `mean` with `all` in `FUN`, what does that mean? 2) `lane.change` is a factor variable but `ave` finds the average, how does it work for factors? – umair durrani May 15 '14 at 14:19

How to remove all rows belonging to a particular group when only one row fulfills the condition in R?

Required

2 Answers2

Linked