0

I can sample 10 rows from a data.frame like this:

mtcars[sample(1:32, 10),]

What is syntax for doing this with dplyr? This is what I tried:

library(dplyr)
filter(mtcars, sample(1:32, 10))
marbel
  • 7,560
  • 6
  • 49
  • 68
luciano
  • 13,158
  • 36
  • 90
  • 130
  • my guess would have been to make a numeric index column (since mtcars rownames are strings) and do `filter(mtcars, index == sample(1:32, 10))`, but that doesn't work. – rawr Jan 27 '14 at 21:34
  • This at least works: `filter(mtcars, seq_len(nrow(mtcars)) %in% sample(1:32, 10))`. (Since I'm not really familiar with **dplyr**, and it may supply more succinct/efficient ways of saying/doing this, I won't post this as an answer.) – Josh O'Brien Jan 27 '14 at 22:03

2 Answers2

1

I believe you aren't really "filtering" in your example, you are just sampling rows.

In hadley´s words here is the purpose of the function:

filter() works similarly to subset() except that you can give it any number of filtering conditions which are joined together with & (not && which is easy to do accidentally!)

Here is an example with the mtcars dataset, as it's used in the introductory vignette

library(dplyr)
filter(mtcars, cyl == 8, wt < 3.5)
mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
2 15.2   8  304 150 3.15 3.435 17.30  0  0    3    2
3 15.8   8  351 264 4.22 3.170 14.50  0  1    5    4

As a conclusion: filter is equivalen to subset(), not sample().

marbel
  • 7,560
  • 6
  • 49
  • 68
  • 1
    It is not necessary to use & in the example since dplyr does this by automatically. From the vignette: "any number of filtering conditions which are joined together with &". So filter(mtcars, cyl == 8, wt < 3.5) will work as well. – Vincent Jan 28 '14 at 04:42
  • you are rigth, i´ll edit the answer. – marbel Jan 28 '14 at 12:06
0

Figured out how to do it (although Josh O'Brien beat me to it):

filter(mtcars, rownames(mtcars) %in% sample(rownames(mtcars), 10, replace = F))
luciano
  • 13,158
  • 36
  • 90
  • 130
  • Just curious ... what is the advantage over your initial mtcars[sample(1:32, 10),]? – Vincent Jan 28 '14 at 17:53
  • 1
    I want try to use the dplyr functions for data manipulation whenever possible. I think the using same functions whenever possible keeps code as readable as possible. – luciano Jan 28 '14 at 20:22
  • 1
    @luciano you are free to do it, if you feel that way. But `mtcars[sample(1:32, 10),]` is much more readable than the filter(...) expression you are using. It's not intended for sampling, otherwise it would be called sample, not filter! – marbel Feb 06 '14 at 12:26