47

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted

I´m trying to get the second to the seventh line in a data.frame using dplyr.

I´m doing this:

require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
df <- df %>% filter(row_number() <= 7, row_number() >= 2)

But this throws an error.

Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

I know i could easily make:

df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)

But I would like to understand why my first try is not working.

Jonno Bourne
  • 1,931
  • 1
  • 22
  • 45
Daniel Falbel
  • 1,721
  • 1
  • 21
  • 41

4 Answers4

103

Actually dplyr's slice function is made for this kind of subsetting:

df %>% slice(2:7)

(I'm a little late to the party but thought I'd add this for future readers)

talat
  • 68,970
  • 21
  • 126
  • 157
  • 1
    thanks, this was really helpful as the error reoccurred for me. I later found out that this is a inconsistency with how row_number() treats data tables, see: http://stackoverflow.com/questions/23861047/unique-rows-in-dplyr-row-number-from-tbl-dt-inconsistent-to-tbl-df – Alex Jun 21 '16 at 23:10
29

The row_number() function does not simply return the row number of each element and so can't be used like you want:

• ‘row_number’: equivalent to ‘rank(ties.method = "first")’

You're not actually saying what you want the row_number of. In your case:

df %>% filter(row_number(id) <= 7, row_number(id) >= 2)

works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

> row_number()
Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

That's your error right there.

Anyway, that's not the way to select rows.

You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

> df %>% "["(.,2:7,)
  id        var
2  2 0.52352994
3  3 0.02994982
4  4 0.90074801
5  5 0.68935493
6  6 0.57012344
7  7 0.01489950
Spacedman
  • 92,590
  • 12
  • 140
  • 224
  • 12
    The purpose of `row_number()` is definitely to return the row number (hence the name!) and this behaviour is a bug. (Also you don't need `.` in your piping example) – hadley Sep 23 '14 at 22:52
  • Note to anyone thinking `row_number` shows the `row_number` code. It doesn't. You want the `row_number_prototype` C++ function. – Spacedman Sep 24 '14 at 07:01
  • 1
    Would you care to explain how the `"["(.,2:7,)` syntax works? It's really interesting solution. – Konrad Aug 04 '16 at 14:49
  • 3
    Its just the way almost everything in R can be written as a function. Try `"+"(1,3)`. – Spacedman Aug 04 '16 at 15:12
  • 2
    @Konrad as alternative you can write, slightly more readable than the "[" syntax: df %>% .[2:7, ] – Agile Bean Dec 29 '17 at 09:49
  • 1
    @Spacedman is correct, row_number does *not* return exactly the row number as vector, see e.g. the output of `datasets::airquality %>% row_number`. If you want to use it as an index vector containing the row number, you must convert it to a numeric vector like `datasets::airquality %>% row_number %>% as.numeric` – Agile Bean Apr 05 '19 at 03:17
8

Here is another way to do row-number based filtering in a pipeline.

    df <- data.frame(id = 1:10, var = runif(10))

    df %>% .[2:7,]

    > id     var
      2  2 0.28817
      3  3 0.56672
      4  4 0.96610
      5  5 0.74772
      6  6 0.75091
      7  7 0.05165
dabsingh
  • 281
  • 1
  • 3
  • 10
  • 1
    It's slower than `slice`, but it does't drop `NA` (e.g. `df %>% .[c(NA,2,4,7),]`) which could be useful in some cases. – Bastien Jan 09 '18 at 12:53
0

Another option using subset:

df <- data.frame(id = 1:10, var = runif(10))
subset(df, row.names(df) %in% 2:7)
#>   id        var
#> 2  2 0.75924106
#> 3  3 0.17096427
#> 4  4 0.10886090
#> 5  5 0.98703882
#> 6  6 0.04190195
#> 7  7 0.73268672

Created on 2023-01-13 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53