Suppose that I have a data frame (example_df) that consists of five columns: col1, col2, col3, col4, and col5. I am trying to create a function which takes in example_df and one of its columns to build a new data frame that displays the frequency of each col1, col4, and var combination, as shown below:
summarize_data <- function (df, var) {
var_combination <- data.frame()
temp <- na.omit(unique(df))
unique_var <- unique(temp$var)
for (i in 1:length(unique_var)){
temp2 <- temp[temp$var == unique_var[i], ]
unique_col1 <- na.omit(unique(temp2$col1))
for (j in 1:length(unique_col1)){
temp3 <- temp2[temp2$col1 == unique_col1[j], ]
temp3 <- temp3[!is.na(temp3$col3), ]
var_combination <- rbind(var_combination,
cbind(data.frame(table(temp3$col4)),
var = unique_var[i],
"Col1" = unique_col1[j]))
}
}
}
So if I were to call summarize_data(example_df, col2), I want R to process it such that it will generate col2_combination and unique_col2 as local variables, and recognize temp$var as temp$col2. In short, wherever R sees var, it will replace it by col2. In the final data frame, col2_combination, it will (ideally) have the column names as Var1, Freq (both of which are generated by R through the table statement), col2, and col1.
Is there a way to generate local variables inside summarize_data such that part of its name is taken directly from the second parameter? (col2_combination, unique_col2) And is it even possible for R to understand temp$var as temp$col2 in this case?