0

Suppose that I have a data frame (example_df) that consists of five columns: col1, col2, col3, col4, and col5. I am trying to create a function which takes in example_df and one of its columns to build a new data frame that displays the frequency of each col1, col4, and var combination, as shown below:

summarize_data <- function (df, var) {  
    var_combination <- data.frame()   
    temp <- na.omit(unique(df))  
    unique_var <- unique(temp$var)  
    for (i in 1:length(unique_var)){  
        temp2 <- temp[temp$var == unique_var[i], ]  
        unique_col1 <- na.omit(unique(temp2$col1))    
        for (j in 1:length(unique_col1)){      
            temp3 <- temp2[temp2$col1 == unique_col1[j], ]     
            temp3 <- temp3[!is.na(temp3$col3), ]      
            var_combination <- rbind(var_combination, 
                                     cbind(data.frame(table(temp3$col4)), 
                                           var = unique_var[i], 
                                           "Col1" = unique_col1[j]))  
        }
    } 
}

So if I were to call summarize_data(example_df, col2), I want R to process it such that it will generate col2_combination and unique_col2 as local variables, and recognize temp$var as temp$col2. In short, wherever R sees var, it will replace it by col2. In the final data frame, col2_combination, it will (ideally) have the column names as Var1, Freq (both of which are generated by R through the table statement), col2, and col1.

Is there a way to generate local variables inside summarize_data such that part of its name is taken directly from the second parameter? (col2_combination, unique_col2) And is it even possible for R to understand temp$var as temp$col2 in this case?

Dason
  • 60,663
  • 9
  • 131
  • 148
RLLin
  • 3
  • 1
  • Indent four spaces to make code blocks, or highlight and press CTRL+K. `>` are for quote blocks. – Frank Aug 10 '17 at 18:42
  • thanks for letting me know @Frank. – RLLin Aug 10 '17 at 19:22
  • If `var` is a string, then you can use `temp[[var]]` instead of `temp$var`. Suggested dupe: [Dynamically select data frame columns using `$` and a vector of column names](https://stackoverflow.com/q/18222286/903061) – Gregor Thomas Aug 10 '17 at 20:05
  • @Gregor. I understand that temp[[var]] would be vector, which is a column of my data frame. However, this var is an argument to my function. If I were to type summarize_data(example_df, 'Col1'), would it recognize it as temp$'Col1' when R processes it? And back to my original question, is there a way to assign local variables such that part of its name is derived from the function argument? – RLLin Aug 10 '17 at 20:30
  • Quotes/strings never work with `$`. `mtcars$mpg` is equivalent to `mtcars[["mpg"]]`, both work. `mtcars$"mpg"` does not. This is the same if you have `x = "mpg"`: `mtcars[[x]]` will work, `mtcars$x` will not. Doesn't matter if it's in a function: if you call `summarize_dat(example_df, "Col1")`, you have a string `var = "Col1"` and `temp[[var]]` will work, `temp$var` will not. `[[` plays nicely with variables, `$` is a "magical shortcut" and does not (see `fortunes::fortune(312)`). – Gregor Thomas Aug 10 '17 at 20:57
  • You can create your strings `combo = paste0(var, "_combination"); unq = paste0("unique_", var)` and use these strings as column names as well to create new columns. I don't understand why you would want custom variable names locally inside your function based on the input (i.e., as separate objects, *not* as list items or data frame columns which are easy to do with strings). You could do that with `assign` and `get` using the string names, but it will be painful. – Gregor Thomas Aug 10 '17 at 21:01

0 Answers0