Extract all BUT the first occurrence of a variable in a data frame

Question

I have a dataframe like:

ID        DATE  N  Price
 1  2013-02-04  3  29.99
 1  2013-03-18  1   9.99
 1  2013-04-13  2  19.99
 2  2013-02-18  1  18.99
 2  2013-05-11  2  19.99

This answer Extract rows for the first occurrence of a variable in a data frame tells how to extract the FIRST occurrences of a target value ... but I need ALL BUT the first occurrence, aka :

ID        DATE  N  Price
 1  2013-03-18  1   9.99
 1  2013-04-13  2  19.99
 2  2013-05-11  2  19.99

What is the recommended approach to this?

My initial instinct was to use the approach outlined in the linked answer, build a "first" subset, and then say, "From the original data frame, give me everything except these values" ... but that seemed perhaps more complicated.

Rich Scriven · Accepted Answer · 2015-11-04T06:27:59.160

You can do this with the data.table package quite easily.

library(data.table)
setDT(df)[, .SD[-1], by = ID]
#    ID       DATE N Price
# 1:  1 2013-03-18 1  9.99
# 2:  1 2013-04-13 2 19.99
# 3:  2 2013-05-11 2 19.99

where df is your original data. This removes the first row for each group, grouped by ID.

Another option is the dplyr package.

library(dplyr)
slice(group_by(df, ID), -1)
#      ID       DATE     N Price
#   (int)     (fctr) (int) (dbl)
# 1     1 2013-03-18     1  9.99
# 2     1 2013-04-13     2 19.99
# 3     2 2013-05-11     2 19.99

These remove the first row for all groups. You don't specify what should happen if there is only one row for a group. Should you need to keep those rows, you would need to account for that. So let's add a single row as a group and take a look.

dff <- rbind(df, df[4, ])
dff[6, 1] <- 3

Then the data.table code would be

setDT(dff)[, .SD[if(.N == 1L) 1 else -1], by = ID]
#    ID       DATE N Price
# 1:  1 2013-03-18 1  9.99
# 2:  1 2013-04-13 2 19.99
# 3:  2 2013-05-11 2 19.99
# 4:  3 2013-02-18 1 18.99

and the dplyr code would be

slice(group_by(dff, ID), if(n() == 1L) 1 else -1)
#      ID       DATE     N Price
#   (dbl)     (fctr) (int) (dbl)
# 1     1 2013-03-18     1  9.99
# 2     1 2013-04-13     2 19.99
# 3     2 2013-05-11     2 19.99
# 4     3 2013-02-18     1 18.99

for those situations.

Data:

df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), DATE = structure(c(1L, 
3L, 4L, 2L, 5L), .Label = c("2013-02-04", "2013-02-18", "2013-03-18", 
"2013-04-13", "2013-05-11"), class = "factor"), N = c(3L, 1L, 
2L, 1L, 2L), Price = c(29.99, 9.99, 19.99, 18.99, 19.99)), .Names = c("ID", 
"DATE", "N", "Price"), class = "data.frame", row.names = c(NA, 
-5L))

score 5 · Answer 2 · answered Nov 04 '15 at 02:33

5

If you don't want to use add-on packages:

df[duplicated(df$ID),]

answered Nov 04 '15 at 02:33

Ott Toomet

1,894
15
25

I like this for its simplicity. The best I came up with was `df[-match(unique(df$ID), df$ID),]` , but this is even better. – thelatemail Nov 04 '15 at 02:46

akrun · Answer 3 · 2015-11-04T05:15:01.523

1

Another option is ave

 df[with(df, ave(ID, ID, FUN= seq_along)!=1),]
 #  ID       DATE N Price
 #2  1 2013-03-18 1  9.99
 #3  1 2013-04-13 2 19.99
 #5  2 2013-05-11 2 19.99

edited Nov 04 '15 at 05:15

answered Nov 04 '15 at 04:16

akrun

874,273
37
540
662

Extract all BUT the first occurrence of a variable in a data frame

3 Answers3