2

I have a data frame with 70+ columns. I need to perform some repetitive computations with a number of columns using each column separately.

Based on @Ananda's approach and feedback, here is the reworded simplistic example and solution. I am still keeping the old thread at the end for the sake of discussion thread,

Problem: Calculate sum of various columns of a data frame using a function where column names are specified as multiple arguments:

> df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)

> myFunc(df, aa, bb, cc)
aa series sum is 55 
bb series sum is 1055 
cc series sum is 2055 

> myFunc(df, aa, dd)
aa series sum is 55 
dd series sum is 3055 

> myFunc(df, dd)
dd series sum is 3055 
> 

And myFunc function definition to accomplish this is below

myFunc = function(data, ...){
  argList = match.call(expand.dots=FALSE)$...

  for(i in 1:length(argList)){
    colName = argList[[i]]
    series_colName = eval(substitute(colName), envir=data, enclos=parent.frame())
    cat(colName, "series sum is", sum(series_colName), "\n")
  }
}

This gives me a starting point to work with. If there is a better way to define myFunc, please let me know.

Thanks for all the help

::::Old Discussion Thread:

I am still figuring my ways in R, hence bear with me please. The following sample code simulates my first try and it bombed on me. Where am I going wrong and what will be the R-ish way to do this type of computation. Please help

myFunc = function(data, y, ...){
  argList = list(...)
  argList
  #for each arg in argList
    #do some processing with data, y and column arg
}

df = data.frame(aa=1:10, bb=101:110, cc=201:210, dd=301:310)
myFunc(df, aa, bb)
myFunc(df, aa, bb, cc)

And the error message is

Error in myFunc(df, aa, bb) : object 'bb' not found

Error in myFunc(df, aa, bb, cc) : object 'bb' not found

Adding further so that it becomes more clear.

myFunc(df, aa, c(2,4, 6))

works fine.

I intend to use eval, substitute and envir in further processing to extract the values of various columns, hence I would like to pass the column names in a natural way rather than as character strings. I hope that I am able to communicate my intention clearly.

kishore
  • 541
  • 1
  • 6
  • 18
  • @Ananda, I understand that "aa" is matching the argument y in "myFunc" signature. What is not clear to me is "No error because column aa is matching y, but bb being another column is reporting error". I am not able to understand the semantic difference between (aa for y) and (bb for ...) in the function call signature as both aa and bb are columns of the same data frame. I can access aa in myFunc by using combination of eval/substitute/envir. – kishore Mar 08 '14 at 04:45
  • @Ananda: here is one sample myFunc code which works ok. `myFunc = function(data, y, ...){ argList = list(...) series_y = eval(substitute(y), envir=data, enclos=parent.frame()) cat("Sum of", deparse(substitute(y)), "is", sum(series_y), "\n") }` – kishore Mar 08 '14 at 05:07
  • @Ananda: I have reworded my question and also put the code to get me going. Your answer using `match.call` got me started. +1 for your detailed explanation. – kishore Mar 08 '14 at 07:52

2 Answers2

1

I got this somewhere (most likely SO): Use match.call as follows...

myFunc <- function(data, ...) {
  argList <- as.character(match.call(expand.dots=FALSE)$...) 
  argList
}

myFunc(df, aa, bb)
# [1] "aa" "bb"
myFunc(df, aa, bb, cc)
# [1] "aa" "bb" "cc"

Your followups in the comments are very unclear, so I'll try to explain with an example.

In the below, I've added a "y" argument to the function and for the sake of demonstration, let's just return the relevant values in a list.

myFunc <- function(data, y, ...) {
  argList <- as.character(match.call(expand.dots=FALSE)$...) 
  list(y, argList)
}

If we don't specify the "y =" part when using the function, R assumes that the second value should be used for "y" and all other values should be used for "...".

myFunc(df, aa, bb)
# Error in myFunc(df, aa, bb) : object 'aa' not found
myFunc(df, y = NULL, aa, bb)
# [[1]]
# NULL
# 
# [[2]]
# [1] "aa" "bb"

You were not getting any error because your version of the function made no reference to "y".

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
0

Because aa, bb and cc do not exist. It needs to know that they exist in df:

myFunc(df, df$aa, df$bb)

myFunc(df, df$aa, df$bb, df$cc)
  • myFunc(df, aa, c(2, 4, 6)) executes just fine. I am not sure about the reason, but argument aa goes fine for the formal argument "y", it is the handling of "..." with respect to arguments like "bb", "cc" etc. My R knowledge is not deep enough to figure this out yet. – kishore Mar 07 '14 at 15:42
  • You need to provide more information about what you are trying to do. Give us an example of the output you want from your function when you have more than one variable in argList. What output would you want this to give: myFunc(df,aa,bb) – Christie Haskell Marsh Mar 07 '14 at 18:56
  • I agree I need to be more expressive with what am I trying to achieve. Let me work on a simpler problem definition. Before I do that I need to try match.call route as suggested above. – kishore Mar 08 '14 at 04:48