0

dplyr filter function source code ,i can't get,when I click filter(), source code is UseMethod(), when I debug either.nothing appeard;

my test code :

filter(irirs,Sepal.Length>7.1)

so I try to write my own function

first version :

   filter<-function(data,condition){
attach(data)
r<- data[which(condition,)]
detach(data)
return (r)
}

It works ,when I use system.time() to compared dplyr:filter and mine:filter, mine loose,spend much time than dplyr;

second version:

   filter<-function(data,condition){
r<-with(data,data[which(condition),])
return (r)
}

It report errors , Sepal.Length not found.

I kwnow is condition param'problem,
if I use with(irirs,irirs[which(Sepal.Length>7.1),]) directly ,it works, but I need a own filter function

I have two question :

  • a. how to write a effective filter or fix my second version code's problem.
  • b. how to read function source code like usemethod("func")

thanks a lot!

llinvokerl
  • 1,029
  • 10
  • 25
SummersKing
  • 301
  • 1
  • 11

4 Answers4

5

Here are some possibilities:

myfilter1 <- function(data, condition) {
  do.call(subset, list(data, substitute(condition)), envir = parent.frame())
}
myfilter1(iris, Sepal.Length > 7.1)

myfilter2 <- function(data, condition) {
  eval.parent(substitute(with(data, data[condition, ])))
}
myfilter2(iris, Sepal.Length > 7.1)

library(gtools)
myfilter3 <- defmacro(data, condition, expr = {
  with(data, data[condition, ])
})
myfilter3(iris, Sepal.Length > 7.1)

Read R source code associated with an S3 generic

To read the source R code for the methods of an S3 generic f first list the methods:

methods(f)

and then if f.x is one of the methods listed enter its name without parentheses into R:

f.x

or if that does not work (which would be the case if there is a * after the name in the methods output) then

getAnywhere("f.x")

If the code is in package p on CRAN then we could google for cran p and download its source from the package's CRAN home page or find it on github by googling for cran github p and look at the source on the github web site.

Performance

Regarding performance this is what I get on my PC:

library(dplyr)
library(gtools)
library(microbenchmark)

f1 <- function() {
  len <- 7.1
  myfilter1(iris, Sepal.Length > len)
}
f2 <- function() {
  len <- 7.1
  myfilter2(iris, Sepal.Length > len)
}
f3 <- function() {
  len <- 7.1
  myfilter3(iris, Sepal.Length > len)
}
fd <- function() {
  len <- 7.1
  filter(iris, Sepal.Length > len)
}

microbenchmark(f1(), f2(), f3(), fd())

giving the following.

Unit: microseconds
 expr    min      lq     mean  median      uq    max neval cld
 f1()  399.2  433.70  497.133  482.00  518.85 1362.6   100  b 
 f2()  301.4  326.15  374.078  364.50  407.65  579.1   100 a  
 f3()  302.4  330.65  375.650  352.25  397.15  623.0   100 a  
 fd() 1791.5 1948.60 2166.466 2117.35 2262.65 3443.7   100   c

myfilter2 and myfilter3 have about the same mean times and both are faster than the other two.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • besides I have got a problem on these filter function ,`len<-7.1 myfilter2(iris, Sepal.Length > len) ` , error for object len not found , what should i fix this bug – SummersKing Dec 27 '19 at 08:34
  • Good point. Have fixed `myfilter1` and `myfilter2`. They previously had problems with `f <- function() { len <- 7.1; myfilter1(iris, Sepal.Length > len) }; f()` but that and your example should work now. `myfilter3` already handled these cases. – G. Grothendieck Dec 27 '19 at 11:02
1

One way would be to use eval parse, text in subset to write your own filter method

my_filter <- function(data, condition) {
    subset(data, eval(parse(text = condition)))
}

my_filter(iris, "Sepal.Length > 7.1")
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#106          7.6         3.0          6.6         2.1 virginica
#108          7.3         2.9          6.3         1.8 virginica
#110          7.2         3.6          6.1         2.5 virginica
#118          7.7         3.8          6.7         2.2 virginica
#119          7.7         2.6          6.9         2.3 virginica
#123          7.7         2.8          6.7         2.0 virginica
#126          7.2         3.2          6.0         1.8 virginica
#130          7.2         3.0          5.8         1.6 virginica
#131          7.4         2.8          6.1         1.9 virginica
#132          7.9         3.8          6.4         2.0 virginica
#136          7.7         3.0          6.1         2.3 virginica

my_filter(mtcars, "cyl == 6")

#                mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#Valiant        18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#Merc 280       19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#Merc 280C      17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

As far as your second question is concerned it is answered here :

How can I view the source code for a function?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • thanks very much ,but i don't like character paramter ,is useful ,but is not elegant when i tidyr my data , i just want `my_filter(mtcars, cyl == 6)` this type – SummersKing Dec 27 '19 at 03:19
0

thaks for @G. Grothendieck answer,

i compared my version 1 filter,and @G. Grothendieck's myfilter2 and myfilter1 and dplyr::filter ,

i use system.time() to compare them , my version 1 is the lowest efficiency ,and `myfilter2' is best ,

each func execute 10000 times,usefunc(irirs,Sepal.Length>7.1) this is time spend table:

  # func_name                 time_spend(s)
  # my version 1::filter         15.706          
  #  dplyr:filter                 6.108
  #    myfilter1                  1.648
  #    myfilter2                  1.229

time_spend is system.time() 's elapsed

i didn't test myfilter3 because of i only want use base package to achieve my goals

SummersKing
  • 301
  • 1
  • 11
0

We can use tidyverse options

library(dplyr)
my_filter <- function(data, condition) {
    data %>%
         filter(!! rlang::parse_expr(condition))
  }

my_filter(mtcars, "cyl == 6")
#   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

Or if we want to pass a unquoted expression, use enexpr

my_filter <- function(data, condition) {
        data %>%
              filter(eval(rlang::enexpr(condition)))
       }
my_filter(mtcars, cyl == 6)
#   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
#6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
#7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
akrun
  • 874,273
  • 37
  • 540
  • 662