I have a dataframe that I would like to condense by removing duplicates, but only of a certain variable. In the example below, I would only like to remove duplicates of user_id
when the plan_type = subscriber
. The output
of is shown below of how the sample data should be condensed.
I have tried unique()
but it will not work because there may be multiple occurrences of the same user_id
where plan_type = PPG
and this data should remain.
Any suggestions that do not include multiple steps of subsetting and then rebinding two dataframes?
> foo
user_id plan_type
16435 6264 subscriber
31518 10050 subscriber
31520 10050 subscriber
7576 11174 subscriber
19744 11186 subscriber
19745 11186 subscriber
46108 20348 subscriber
5293 31641 subscriber
5294 31641 subscriber
5295 31641 PPU
> output
user_id plan_type
16435 6264 subscriber
31520 10050 subscriber
7576 11174 subscriber
19745 11186 subscriber
46108 20348 subscriber
5294 31641 subscriber
5295 31641 PPU
> dput(foo)
structure(list(user_id = c(6264L, 10050L, 10050L, 11174L, 11186L,
11186L, 20348L, 31641L, 31641L, 31641L), plan_type = c("subscriber",
"subscriber", "subscriber", "subscriber", "subscriber", "subscriber",
"subscriber", "subscriber", "subscriber", "PPU")), .Names = c("user_id",
"plan_type"), row.names = c(16435L, 31518L, 31520L, 7576L, 19744L,
19745L, 46108L, 5293L, 5294L, 5295L), class = "data.frame")