4

The purpose of the functions below are to allow for self-referencing assignments more easily. (As suggested here: Referencing a dataframe recursively )

So that instead of

 # this  
 myDataFrame$variable[is.na(myDataFrame$variable)] <- 0

 # we can have this: 
 NAto0(myDataFrame$variable)

The functions work well for vectors, but less so when *ply'ing

I'm encountering two issues with regards to the match.call() portion of the function selfAssign() (code is below). Questions are:

  1. How can I determine from within a function if it was called from an *apply-type function?
  2. How can I trace back up the call to the correct variable environment?

I've included the argument n to selfAssign(.) which works well for the eval statement at the end. I'm wondering if I could somehow make use of n something akin to

 sapply(df, NAto0, n=2)

and perhaps in selfAssign have something like sys.parent(n) (which I tried, and either I did not get it right, or it does not work)

Any suggestions would greatly appreciated.



FUNCTIONS

These functions are wrappers to selfAssign and are the ones that would be used in the *apply calls.

NAtoNULL <- function(obj, n=1) {
# replace NA's with NULL
  selfAssign(match.call()[[2]], is.na(obj), NULL, n=n+1)
}

NAto0 <- function(obj, n=1) {
# replace NA's with 0
  selfAssign(match.call()[[2]], is.na(obj), 0, n=n+1)
}

NAtoVal <- function(obj, val, n=1) {
  selfAssign(match.call()[[2]], is.na(obj), val, n=n+1)  
}

ZtoNA <- function(obj, n=1) {
# replace 0's with NA

  # TODO: this may have to be modified if obj is matrix
  ind <- obj == 0
  selfAssign(match.call()[[2]], ind, NA, n=n+1)
}

selfAssign is the function performing the work and where the error is coming from

selfAssign <- function(self, ind, val, n=1, silent=FALSE) {
## assigns val to self[ind] in environment parent.frame(n)
## self should be a vector.  Currently will not work for matricies or data frames

  ## GRAB THE CORRECT MATCH CALL
  #--------------------------------------
      # if nested function, match.call appropriately
      if (class(match.call()) == "call") {
        mc <- (match.call(call=sys.call(sys.parent(1))))   ## THIS LINE PROBABLY NEEDS MODIFICATION
      } else {
        mc <- match.call()
      }

      # needed in case self is complex (ie df$name)
      mc2 <- paste(as.expression(mc[[2]]))


  ## CLEAN UP ARGUMENT VALUES
  #--------------------------------------
      # replace logical indecies with numeric indecies
      if (is.logical(ind))
        ind <- which(ind) 

      # if no indecies will be selected, stop here
      if(identical(ind, integer(0)) || is.null(ind)) {
        if(!silent) warning("No indecies selected")
        return()
      }

      # if val is a string, we need to wrap it in quotes
      if (is.character(val))
        val <- paste('"', val, '"', sep="")

      # val cannot directly be NULL, must be list(NULL)
      if(is.null(val))
        val <- "list(NULL)"


  ## CREATE EXPRESSIONS AND EVAL THEM
  #--------------------------------------
     # create expressions to evaluate
     ret <- paste0("'[['(", mc2, ", ", ind, ") <- ", val)

     # evaluate in parent.frame(n)
     eval(parse(text=ret), envir=parent.frame(n))
}
Community
  • 1
  • 1
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178
  • Also, I realize that `df[is.na(df)] <- 0` is a simpler alternative for the specific `sapply` statement above. I am of course looking to gneralize this to cases where similar statements would be cumbersome. – Ricardo Saporta Nov 29 '12 at 08:24
  • 3
    The usual warning: R is a pass-by-value language and you are trying to go against nature by modifying the objects you are passing to your functions without having to reassign them with the function's output, i.e. you want to do `f(x)` instead of `x <- f(x)`. IMHO, this is both dangerous and difficult. There are however packages that will let you do pass-by-reference: `R.oo`, `proto`, maybe more. – flodel Nov 29 '12 at 11:44
  • Interestingly, you may be having the same difficulties as the base `Recall` function. From the doc: *‘Recall’ will not work correctly when passed as a function argument, e.g. to the ‘apply’ family of functions.*. – flodel Nov 29 '12 at 11:54
  • @flodel, thanks for pointing out Recall. You may be right about having a similar cause. As for the usual warning, thank you as well. It's certainly difficult, but I would be open to hearing more as to the potential dangers and pitfalls – Ricardo Saporta Nov 30 '12 at 01:06
  • The particular problem (... used in siutation where it does not exist) here is that you are matching a call issued from sapply to your function, and that call includes `...`, but your function doesn't have it. There are ways around that, but more problematic is that even if you match the call, the variable name sapply uses is `X`, which likely isn't the name of the variable you want to set. You have to dig all the way through the call stack to find the actual variable name. You can probably do that for very specific instances (e.g. sapply), but it will be almost impossible to generalize. – BrodieG Dec 17 '13 at 23:18

1 Answers1

1

Note, I don't endorse this type of stuff, but do endorse the desire to understand how R works so that you can do this stuff if you want.

The following works for sapply only, so it only partially answers your question, but it does lay out the strategy you can take. As I noted in a previous comment, it will be very difficult to make this robust, but I can answer 1 and 2 in the specific context of an sapply call

  1. Use sys.calls to get the trace stack
  2. Use sys.frame and sys.parents to get the appropriate evaluation environment

A non robust illustrative implementation that turns all vectors in a list to NAs using the type of strategy you want:

get_sapply_call <- function(x) get_sapply_call_core(x)  # To emulate your in-between functions
get_sapply_call_core <- function(x) {
  if((c.len <- length(s.calls <- sys.calls())) < 4L) return("NULL")
  if(s.calls[[c.len - 2L]][[1L]] == quote(lapply) &     # Target sapply calls only
     s.calls[[c.len - 3L]][[1L]] == quote(sapply) &
     s.calls[[c.len - 1L]][[1L]] == quote(FUN)) {
    mc.FUN <- s.calls[[c.len - 1L]]
    mc.sa <- match.call(definition=sapply, s.calls[[c.len - 3L]])  # only need to match sapply b/c other calls are not user calls and as such structure is known
    call.txt <- paste0(
      as.character(mc.sa[[2L]]), "[[", mc.FUN[[2L]][[3L]], 
      "]] <- rep(NA, length(", as.character(mc.sa[[2L]]), "[[", mc.FUN[[2L]][[3L]], 
      "]]))"
    )
    call <- parse(text=call.txt)
    eval(call, envir=sys.frame(sys.parents()[c.len - 3L]))
    return(call.txt)
  }
  return("NULL")
}
df <- data.frame(a=1:10, b=letters[1:10])
sapply(df, get_sapply_call)
#                                     a                                     b 
# "df[[1]] <- rep(NA, length(df[[1]]))" "df[[2]] <- rep(NA, length(df[[2]]))" 
df
#     a  b
# 1  NA NA
# 2  NA NA
# 3  NA NA
# 4  NA NA
# ...

You will need different logic for different *apply functions, and even more different logic if your function is being called indirectly in some other way. Also, this is definitely a quick and dirty implementation, so even for sapply you'll likely need to add stuff to make it more robust. And there is no guarantee that the sapply implementation won't change in the future breaking all of the above.

EDIT: note you can totally sidestep the problem you had with match.call

BrodieG
  • 51,669
  • 9
  • 93
  • 146