Weird behavior in dplyr slice for R

Question

When calling slice(df, i) in the dplyr package for R, if the row index I ask for doesn't exist (nrows < i), it appears to return all the rows but the first of the group, like I had called slice(df, -1).

For example:

library(dplyr)

c1 <- c("a","b","c")
c2 <- 1:3
df <- data.frame(c1,c2)

slice(df,2)

The result will be as expected:

b  2

But if I call

slice(df, 5)

the result is every row but the first row:

b  2
c  3

This is especially irksome when using group_by() and THEN calling slice() on the groups. Is there a logical reason why slice() is doing this?

It seems like returning row(s) filled with NAs for row indices larger than 'nrows' in groups not "tall enough" to produce the requested slice could be a useful result.

This came up as I was trying to extract a ranked result from each group, but some groups did not have enough data while others did. e.g. "List the 10th highest sales-producing salesperson from each region." But in one of the regions there are only 8 salespersons.

More info on how to give a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). — Jaap, May 27 '15 at 19:47
A similar problem applies to `slice(df,0)`, which should return an "empty" data.frame, as `df[0,]` does. — Frank, May 27 '15 at 20:39

hackR · Answer 1 · 2016-02-26T16:58:49.333

3

I'm kinda late to this party but here goes. There is a really simple solution to the error message "Error: incompatible types, expecting a character vector"

just insert ungroup() prior to your mutate() function and you should be OK.

But I think its a bug of some type in slice(). I will file a bug report.

edited Feb 26 '16 at 16:58

answered Feb 26 '16 at 16:28

hackR

1,459
17
26

1

Was this bug ever reported? if so, can you link it? It's more than 3 years later and the problem seems to persist... – Ratnanil Aug 24 '18 at 19:49

score 0 · Answer 2 · answered May 27 '15 at 20:25

0

I agree: This behavior doesn't seem right. You can use the following as an alternative:

df <- data_frame(c1=c('a', 'a', 'b', 'c'), c2=c(1,2,3,4))

#   c1 c2
# 1  a  1
# 2  a  2
# 3  b  3
# 4  c  4

# get the second smallest row for each group, or the last row for 
# groups with less than 2 elements
df %>% 
    group_by(c1) %>% 
    filter(row_number() == min(2, n()))
#   c1 c2
# 1  a  2
# 2  b  3
# 3  c  4

answered May 27 '15 at 20:25

Matthew Plourde

43,932
7
96
113

Thanks, Matt. I like it--go back to `filter` and recreate what `slice` is supposed to do with a touch of ingenuity! – huff May 29 '15 at 12:30

Weird behavior in dplyr slice for R

2 Answers2

Linked