2

My question builds on another one previously posted by someone: mapply for all arguments' combinations [R]

I want to apply a function to multiple arguments using mapply, and this works with my code below. But I want to add a condition such that NOT ALL tmin- and tmax- values will be combined, instead only the first tmin with the first tmax, the second tmin with the second tmax (if tmin == 0.01 & tmax == 0.99 or if tmin == 0.05 & tmax == 0.95, but e.g. tmin == 0.01 should not be combined with tmax == 0.95). But the first elements of tmin & tmax should be combined with ALL variables, all second elements of tmin & tmax should be combined with ALL variables, etc (as below in the expand.grid() function).

In the end I should have a data frame as the one called "alltogether", but I should have 15 rows with the described condition and not 75 as it is the case now.

I could just filter rows with dplyr::filter afterwards, but is there a nice way to include this condition in the function?

Here an example data frame:

 dataframe <- data.frame(personID = 1:10, 
                  Var1 = c(4, 6, 3, 3, 7, 1, 20, NA, 12, 2),
                  Var2 = c(5, 4, 5, 6, 9, 14, 14, 1, 0, NA),
                  Var3 = c(NA, 15, 12, 0, NA, NA, 2, 7, 6, 7),
                  Var4 = c(0, 0, 0, 0, 1, 0, 1, 4, 2, 1), 
                  Var5 = c(12, 15, 11, 10, 10, 15, NA, 10, 13, 11))

and here the code I have so far:

des <- function(var, tmin, tmax){
  v <- var[var >= quantile(var, probs = tmin, na.rm = TRUE) &
             var <= quantile(var, probs = tmax, na.rm = TRUE)]
  d <- psych::describe(v)
  df <- cbind(variable = deparse(substitute(var)), tmin = tmin, tmax = tmax, d)
  print(df)
}
args = expand.grid(var = dataframe[, c("Var2", "Var4", "Var5")], tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))

alltogether <- do.call("rbind", mapply(FUN = des, var = args$var, tmin = args$tmin, tmax = args$tmax,  SIMPLIFY = FALSE))

Thank you for helping!

Edit:

The expected output is the one after filtering the "alltogether"-dataframe with the following code (15 obs. of 16 variables):

alltogether <- alltogether%>%
  dplyr::filter((tmin == 0.01 & tmax == 0.99) | 
                (tmin == 0.05 & tmax == 0.95) |
                (tmin == 0.1 & tmax == 0.9) |
                (tmin == 0.2 & tmax == 0.8) | 
                (tmin == 0.25 & tmax == 0.75))
Community
  • 1
  • 1
fabha
  • 111
  • 7
  • Can you provide an expected output? – Yannis Vassiliadis May 15 '17 at 07:12
  • Sure. Is it clear now? – fabha May 15 '17 at 08:01
  • Additionally, how do I get the Variable names ("Var2", "Var4", "Var5") in the "variable"-column in the dataframe "alltogether"? I tried variable = deparse(substitute(var) in the function "des", but it does not work. Thanks for any ideas! – fabha May 15 '17 at 08:10
  • @YannisVassiliadis can I ask you another question? How would you add another column to the final dataframe with the correlation coefficient between the variable (in the "variable"-column) and a fix variable for all correlation coefficients (lets say Var1)? I'm thinking of a code in the function like that: `c <- cor(dataframe[, c("Var1", var)], method = "spearman")[1,2]; df <- cbind(variable = names(var), tmin = tmin, tmax = tmax, d, correlation_Var1 = c)`, but I'd have to include Var1 in the quantile function, otherwise the result would not be true. I can't figure out anything that works... – fabha May 16 '17 at 13:35
  • I'm glad it worked for you. I'm editing the answer below to also include the correlations. I wasn't 100% sure that that's what you wanted, but if not, let me know. The biggest difference is that I added 2 arguments in the function. The first one is mandatory, and it's the variable you want to calculate the correlation with. The second one is optional and it's the method of correlation. If you don't specify it, it calculate the "Spearman" correlation. – Yannis Vassiliadis May 16 '17 at 19:47
  • @YannisVassiliadis thank you! That's awesome. It's exactly what I wanted. – fabha May 21 '17 at 17:45

1 Answers1

1

OK, here's a solution to both problems. Unfortunately, I couldn't get one using mapply so I had to rely on a good old for loop (but it's still faster, given that it doesn't have to do all the extra calculations). Also, I changed the function to give you the names of the variables as you wanted. The biggest difference is that I'm not using expand.grid but merge. Finally, it incorporates your comment from above.

des <- function(var, tmin, tmax, cor.var, cor.method = c("spearman", "pearson", "kendall")){
  var[var < quantile(var, probs = tmin, na.rm = TRUE) |
        var > quantile(var, probs = tmax, na.rm = TRUE)] <- NA
  d <- psych::describe(var)
  correlation<- cor(cor.var, var, use="pairwise.complete", match.arg(cor.method))
  df <- cbind(variable = names(var), tmin = tmin, tmax = tmax, d, correlation)
  names(df)[length(names(df))]<- paste0("correlation_with_", names(cor.var))  
  print(df)
}

minmax = data.frame(tmin = c(0.01, 0.05, 0.1, 0.2, 0.25), tmax = c(0.99, 0.95, 0.9, 0.8, 0.75))
args<- merge(c("Var2", "Var4", "Var5"), minmax)
args[,1]<- as.character(args[,1])

    alltogether<- NULL
for (i in 1:nrow(args)){
    alltogether<- rbind(alltogether, des(var = dataframe[args[i,1]], 
                       tmin = args[i, 2], tmax=args[i, 3], cor.var = dataframe["Var1"]))
}
Yannis Vassiliadis
  • 1,719
  • 8
  • 14