9

Possible Duplicate:
Generating a Call Graph in R

I'd like to systematically analyze a given function to find out which other functions are called within that very function. If possible, recursively.

I came across this function in a blog post by milktrader with which I can do something similar for packages (or namespaces)

listFunctions <- function(
    name,
    ...
){ 
    name.0  <- name
    name    <- paste("package", ":", name, sep="")
    if (!name %in% search()) {
        stop(paste("Invalid namespace: '", name.0, "'"))
    }
    # KEEP AS REFERENCE       
#    out <- ls(name)
    funlist <- lsf.str(name)
    out     <- head(funlist, n=length(funlist))
    return(out)
}

> listFunctions("stats")
  [1] "acf"                  "acf2AR"               "add.scope"           
  [4] "add1"                 "addmargins"           "aggregate"           
  [7] "aggregate.data.frame" "aggregate.default"    "aggregate.ts"        
 [10] "AIC"                  "alias"                "anova"               
....
[499] "xtabs"   

Yet, I'd like a function where name would be the name of a function and the return value is a character vector (or a list, if done recursively) of functions that are called within name.

Motivation

I actually need some sort of character based output (vector or list). The reason for this is that I'm working on a generic wrapper function for parallelizing an abitrary "inner function" where you don't have to go through a time consuming trial-and-error process in order to find out which other functions the inner function depends on. So the output of the function I'm after will directly be used in snowfall::sfExport() and/or snowfall::sfSouce.

EDIT 2012-08-08

As there's been some close-votes due to duplicity, I'll check how answers can be merged with the other question tomorrow.

Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120
  • 3
    See [Generating a Call Graph in R](http://stackoverflow.com/q/4795982/1243762). The keyword being [call graph](http://en.wikipedia.org/wiki/Call_graph) – Guy Coder Aug 08 '12 at 21:20
  • I don't know the answer, but `listFunctions <- function(name) ls(paste("package", name, sep=":"))` works too. – Karsten W. Aug 08 '12 at 21:22
  • @GuyCoder: thanks for the pointer. AFAIU, all answers are will generate a call *graph*. That's nice, but not exactly what I need (see updated post). I'm sure the functions mentioned will give me valuable insights on how to come up with a function that fits my needs, but maybe there's already something out there? – Rappster Aug 08 '12 at 21:31
  • @KarstenW.: thanks, always nice to see alternative ways! I really just copied it from the post and didn't give it much more thought. – Rappster Aug 08 '12 at 21:32
  • 3
    While `foodweb` will generate a graphic by default, it also returns (invisibly) an object that contains the calling information as a matrix (along with other things). Check out the Value section of the foodweb help page as well as `callers.of` and `callees.of` documented in the same page. – Brian Diggs Aug 08 '12 at 21:36
  • @BrianDiggs: cool, that sounds more like it. Thanks! – Rappster Aug 08 '12 at 21:37
  • It is a call graph even if there is no graphical output. Graph in this sense is in a mathematical sense of nodes and (directional) connecting edges. That is, graph theory. – Brian Diggs Aug 08 '12 at 21:38
  • @BrianDiggs: ah, okay get it ;-) – Rappster Aug 08 '12 at 21:38
  • @guycoder: hey no problem, didn't want to sound ungreatful, just didn't see the direct link after your first post. Much appreciated! – Rappster Aug 08 '12 at 22:09

3 Answers3

8

try this example:

library(codetools)

ff <- function(f) {
  leaf <- function (e, w) {
    r <- try(eval(e), silent = TRUE)
    if(!is.null(r) && is.function(r)) ret <<- c(ret, as.character(e))
  }
  call <- function (e, w) {
    walkCode(e[[1]], w)
    for (a in as.list(e[-1])) if (!missing(a)) walkCode(a, w)
  }
  ret <- c()
  walkCode(body(f), makeCodeWalker(call = call, leaf = leaf, write = cat))
  unique(ret)
}

then,

> ff(data.frame)
 [1] "{"               "<-"              "if"              "&&"              "is.null"         "row.names"       "function"        "is.character"   
 [9] "new"             "as.character"    "anyDuplicated"   "return"          "||"              "all"             "=="              "stop"           
[17] "gettextf"        "warning"         "paste"           "which"           "duplicated"      "["               "as.list"         "substitute"     
[25] "list"            "-"               "missing"         "length"          "<"               "!"               "is.object"       "is.integer"     
[33] "any"             "is.na"           "unique"          "integer"         "structure"       "character"       "names"           "!="             
[41] "nzchar"          "for"             "seq_len"         "[["              "is.list"         "as.data.frame"   ".row_names_info" ">"              
[49] "deparse"         "substr"          "nchar"           "attr"            "abs"             "max"             "("               "%%"             
[57] "unclass"         "seq_along"       "is.vector"       "is.factor"       "rep"             "class"           "inherits"        "break"          
[65] "next"            "unlist"          "make.names"      "match"           ".set_row_names" 
> ff(read.table)
 [1] "{"              "if"             "&&"             "missing"        "file"           "!"              "text"           "<-"             "textConnection"
[10] "on.exit"        "close"          "is.character"   "nzchar"         "inherits"       "stop"           "isOpen"         "open"           ">"             
[19] "readLines"      "<"              "min"            "("              "+"              "lines"          ".Internal"      "quote"          "length"        
[28] "all"            "=="             "pushBack"       "c"              "stdin"          "scan"           "col"            "numeric"        "-"             
[37] "for"            "seq_along"      "["              "max"            "!="             "warning"        "paste0"         ":"              "make.names"    
[46] "names"          "is.null"        "rep"            "match"          "any"            "<="             "rep.int"        "list"           "%in%"          
[55] "sapply"         "do.call"        "data"           "flush"          "[["             "which"          "is.logical"     "is.numeric"     "|"             
[64] "gettextf"       "&"              "is.na"          "type.convert"   "character"      "as.factor"      "as.Date"        "as.POSIXct"     "::"            
[73] "methods"        "as"             "row.names"      ".set_row_names" "as.integer"     "||"             "is.object"      "is.integer"     "as.character"  
[82] "anyDuplicated"  "class"          "attr"          
kohske
  • 65,572
  • 8
  • 165
  • 155
  • Thanks a lot for your answer and the pointer to `codetools`! I didn't have the time to check both of the answers so far, so I started out with Edward's answer as I like to understand the details of how to do it with base R functionality before turning to contrib packages. But codetools seems pretty promising. – Rappster Aug 09 '12 at 16:57
  • Seems like the codetools approach is a little slower compared to Edwards approach which I actually wouldn't have thought (see own answer). – Rappster Aug 09 '12 at 17:24
6

There must be better ways out there, but here's my attempt:

listFunctions <- function(function.name, recursive = FALSE, 
                          checked.functions = NULL){

    # Get the function's code:
    function.code <- deparse(get(function.name))

    # break code up into sections preceding left brackets:
    left.brackets <- c(unlist(strsplit(function.code, 
                                       split="[[:space:]]*\\(")))

    called.functions <- unique(c(unlist(sapply(left.brackets, 
                                               function (x) {

        # Split up according to anything that can't be in a function name.
        # split = not alphanumeric, not '_', and not '.'
        words <- c(unlist(strsplit(x, split="[^[:alnum:]_.]")))

        last.word <- tail(words, 1)
        last.word.is.function <- tryCatch(is.function(get(last.word)),
                                      error=function(e) return(FALSE))
        return(last.word[last.word.is.function])
    }))))

    if (recursive){

        # checked.functions: We need to keep track of which functions 
        # we've checked to avoid infinite loops.
        functs.to.check <- called.functions[!(called.functions %in%
                                          checked.functions)]

        called.functions <- unique(c(called.functions,
            do.call(c, lapply(functs.to.check, function(x) {
                listFunctions(x, recursive = T,
                              checked.functions = c(checked.functions,          
                                                    called.functions))
                }))))
    }
    return(called.functions)
}

And the results:

> listFunctions("listFunctions", recursive = FALSE)
 [1] "function"      "deparse"       "get"           "c"            
 [5] "unlist"        "strsplit"      "unique"        "sapply"       
 [9] "tail"          "tryCatch"      "is.function"   "return"       
[13] "if"            "do.call"       "lapply"        "listFunctions"

> system.time(all.functions <- listFunctions("listFunctions", recursive = TRUE))
   user  system elapsed 
  92.31    0.08   93.49 

> length(all.functions)
  [1] 518

As you can see, the recursive version returns a lot of functions. The problem with this is it returns every function called in the process, which obviously adds up as you go. In any case, I hope you can use this (or modify it) to suit your needs.

Edward
  • 5,367
  • 1
  • 20
  • 17
  • Thanks man, I worked with your approach a little today. Changed a couple of things (mainly a "global" buffer for functions already checked; think that's the reason your recursive code took so long) and added some functionality (see my answer). – Rappster Aug 09 '12 at 16:55
  • Nice, your approach has a processing time that's at `0.6870` of the one relying on `codetools` functionality! – Rappster Aug 09 '12 at 17:29
2

Disclaimer

This answer is based on answers by Edward and Kohske. I will not consider this for the answer finally accepted, its main purpose is simply to document another/extended approach and some benchmarks for other users.

Inner Function 1

Courtesy of Edward.

listFunctions_inner <- function(
    name, 
    do.recursive=FALSE,
    .do.verbose=FALSE,
    .buffer=new.env()
){
    ..name  <- "listFunctions_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    name.0 <- name
    if (tryCatch(is.function(get(name)), error=function(e) FALSE)) {
    # PROCESS FUNCTIONS       
        if (.do.verbose) {
            message(paste(..name, " // processing function: '", name, "'", sep=""))
        } 
        # Get the function's code:
        code <- deparse(get(name))  
        # break code up into sections preceding left brackets:
        left.brackets <- c(unlist(strsplit(code, split="[[:space:]]*\\(")))  
        out <- sort(unique(unlist(lapply(left.brackets, function (x) {
            # Split up according to anything that can't be in a function name.
            # split = not alphanumeric, not '_', and not '.'
            words <- c(unlist(strsplit(x, split="[^[:alnum:]_.]")))

            last.word <- tail(words, 1)
            last.word.is.function <- tryCatch(is.function(get(last.word)),
                                          error=function(e) return(FALSE))
            out <- last.word[last.word.is.function]
            return(out)
        }))))
        if (do.recursive){           
            # funs.checked: We need to keep track of which functions 
            # we've checked to avoid infinite loops.
            .buffer$funs.checked <- c(.buffer$funs.checked, name)
            funs.next <- out[!(out %in% .buffer$funs.checked)]        
            if (length(funs.next)) {
                out <- sort(unique(unlist(c(out, do.call(c,
                    lapply(funs.next, function(x) {
                        if (x == ".Primitive") {
                            return(NULL)
                        }
                        listFunctions_inner(
                            name=x, 
                            do.recursive=TRUE,
                            .buffer=.buffer
                        )
                    })
                )))))            
            }
        } 
        out <- sort(unique(unlist(out)))
    } else {
    # PROCESS NAMESPACES
        if (.do.verbose) {
            message(paste(..name, " // processing namespace: '", name, "'", sep=""))
        }
        name    <- paste("package", ":", name, sep="")
        if (!name %in% search()) {
            stop(paste(..name, " // invalid namespace: '", name.0, "'"))
        }
        # KEEP AS REFERENCE       
#        out <- ls(name)
        funlist <- lsf.str(name)
        out     <- head(funlist, n=length(funlist))
    }
    out
}

Inner Function 2

Courtesy of Kohske

listFunctions2_inner <- function(
    name,
    do.recursive=FALSE,
    .do.verbose=FALSE,
    .buffer=new.env()
) {
    ..name <- "listFunctions2_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    name.0 <- name
    if (tryCatch(is.function(get(name)), error=function(e) FALSE)) {
    # PROCESS FUNCTIONS       
        leaf <- function (e, w) {
            r <- try(eval(e), silent = TRUE)
            if(!is.null(r) && is.function(r)) out <<- c(out, as.character(e))
        }
        call <- function (e, w) {
            walkCode(e[[1]], w)
            for (a in as.list(e[-1])) if (!missing(a)) walkCode(a, w)
        }
        out <- c()
        walkCode(
            body(name), 
            makeCodeWalker(call=call, leaf=leaf, write=cat)
        )
        if (do.recursive){           
            # funs.checked: We need to keep track of which functions 
            # we've checked to avoid infinite loops.
            .buffer$funs.checked <- c(.buffer$funs.checked, name)
            funs.next <- out[!(out %in% .buffer$funs.checked)]        
            if (length(funs.next)) {
                out <- sort(unique(unlist(c(out, do.call(c,
                    lapply(funs.next, function(x) {
                        if (x == ".Primitive") {
                            return(NULL)
                        }
                        listFunctions_inner(
                            name=x, 
                            do.recursive=TRUE,
                            .buffer=.buffer
                        )
                    })
                )))))            
            }
        }
        out <- sort(unique(out))
    } else {
    # PROCESS NAMESPACES
        if (.do.verbose) {
            message(paste(..name, " // processing namespace: '", name, "'", sep=""))
        }
        name    <- paste("package", ":", name, sep="")
        if (!name %in% search()) {
            stop(paste(..name, " // invalid namespace: '", name.0, "'"))
        }
        # KEEP AS REFERENCE       
#        out <- ls(name)
        funlist <- lsf.str(name)
        out     <- head(funlist, n=length(funlist))
    }
}

Wrapper Function

This wrapper let's you choose the actual inner function used and allows to specify namespaces that should or should not be considered. That's important for my use case (see section Motivation above), as I'm usually only interested in "own" functions (in .GlobalEnv) that have not yet been moved to a package.

listFunctions <- function(
    name, 
    ns,
    innerFunction=listFunctions,
    do.inverse=FALSE,
    do.table=FALSE,
    do.recursive=FALSE,
    .do.verbose=FALSE
){
    ..name  <- "listFunctions_inner"
    if (!is.character(name) | missing(name)) {
        stop(paste(..name, " // expecting 'name' of class 'character'", sep=""))
    }
    out <- innerFunction(name, do.recursive=do.recursive, 
        .do.verbose=.do.verbose)

    if (do.table) {
        x.ns <- sapply(out, function(x) {
            out <- environmentName(environment(get(x)))
            if (out == "") {
                out <- ".Primitive"
            }
            out
        })
        if (!missing(ns)) {
            if (!do.inverse) {
                idx <- which(x.ns %in% ns)
            } else {
                idx <- which(!x.ns %in% ns)
            }
            if (!length(idx)) {
                return(NULL)
            }
            out <- out[idx]
            x.ns  <- x.ns[idx]
        }
        out <- data.frame(name=out, ns=x.ns, stringsAsFactors=FALSE)
        rownames(out) <- NULL
    }
    out
}

Application

# Character vector
listFunctions("install.packages")

# Data Frame (table)
> listFunctions("install.packages", do.table=TRUE)
                 name         ns
1           .libPaths .Primitive
2   .standard_regexps       base
3                 any .Primitive
4  available.packages      utils
...
84          winDialog      utils

# Consider 'base' only
> listFunctions("install.packages", ns="base", do.table=TRUE)
                name   ns
1  .standard_regexps base
2           basename base
3       capabilities base
...
56           warning base

# Consider all except 'base'
> listFunctions("install.packages", ns="base", do.inverse=TRUE, do.table=TRUE)
                 name         ns
1           .libPaths .Primitive
2                 any .Primitive
3  available.packages      utils
...
28          winDialog      utils

# Recursively, no table
listFunctions("install.packages", do.recursive=TRUE)

# Recursively table
listFunctions("install.packages", do.table=TRUE, do.recursive=TRUE)
                                name         ns
1                     .amatch_bounds       base
2                      .amatch_costs       base
3                                 .C .Primitive
...
544                           xzfile       base

# List functions inside a namespace
listFunctions("utils")
listFunctions("utils", do.table=TRUE)

Benchmark Inner Function 1

> bench <- microbenchmark(listFunctions("install.packages"))
bench
> Unit: milliseconds
                               expr      min       lq   median       uq
1 listFunctions("install.packages") 152.9654 157.2805 160.5019 165.4688
       max
1 244.6589

> bench <- microbenchmark(listFunctions("install.packages", do.recursive=TRUE), times=3)
bench
> Unit: seconds
                                                    expr      min      lq
1 listFunctions("install.packages", do.recursive = TRUE) 6.272732 6.30164
    median       uq      max
1 6.330547 6.438158 6.545769

Benchmark Inner Function 2

> bench <- microbenchmark(listFunctions("install.packages",
+         innerFunction=listFunctions2_inner))
bench
> Unit: milliseconds
                                                                     expr
1 listFunctions("install.packages", innerFunction = listFunctions2_inner)
       min       lq   median       uq      max
1 207.0299 212.3286 222.6448 324.6399 445.4154

> bench <- microbenchmark(listFunctions("install.packages", 
+     innerFunction=listFunctions2_inner, do.recursive=TRUE), times=3)
bench
Warning message:
In nm[nm == ""] <- exprnm[nm == ""] :
  number of items to replace is not a multiple of replacement length
> Unit: seconds
                                                                      expr
1 listFunctions("install.packages", innerFunction = listFunctions2_inner, 
       min       lq   median       uq      max
1 7.673281 8.065561 8.457841 8.558259 8.658678
Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120