0

I have independent and dependent datasets. I want to test all possible relationships between dependent and independent variables and finally calculate the power of the method.

# dependent dataset
test_A <- data.frame(matrix(rnorm(100), nr=10, nc=10))
# independent dataset
test_B <- data.frame(matrix(sample(c(0,1,2), 500, replace = TRUE), nr=50, nc=10))
# Find all combination using dependent and independe datasets's variables
A_B_pair <- subset(expand.grid(c1=names(test_A), c2=names(test_B), 
                                    stringsAsFactors = FALSE))

# Main function to estimate the parameter and p-values 
test_function <- function(x,y){
  c1 <- test_A [[x]]
  c2 <- test_B[[y]]
  Data <- data.frame(1, XX=c1, YY=c2)
  
  model_lm <- lm(YY ~ XX, Data)
  est_lm <- as.numeric(model_lm$coefficients)[2]
  pvalue_lm <- as.numeric(summary(model_lm)$coeffi[,4][2])

  return(unlist(data.frame(lm.estimator = est_lm, lm.pvalue =pvalue_lm)))
}

# Using mapply  to get the all pairs estimators and p-values
output <- mapply(test_function, x=A_B_pair$c1, y=A_B_pair$c2)

# transpose the output
output.data <- data.frame(t(output))

# Put all the dependent and independent variables and their estimated values and p-values in a data frame.
output_final <- cbind(A_B_pair, output.data)

My problem is that I need to replicate this function 100 times to check the power of the method and estimate the parameters. Power would be calculated using the following command:

power <- mean(output_final$lm.pvalue <= 0.05)

How can I do this?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57

1 Answers1

0

You can try -

main_fn <- function() {
  test_A <- data.frame(matrix(rnorm(100), nr=10, nc=10))
  # independent dataset
  test_B <- data.frame(matrix(sample(c(0,1,2), 500, replace = TRUE), nr=50, nc=10))
  # Find all combination using dependent and independe datasets's variables
  A_B_pair <- subset(expand.grid(c1=names(test_A), c2=names(test_B), 
                                 stringsAsFactors = FALSE))
  
  output <- mapply(function(x, y) test_function(test_A, test_B, x, y), 
                   A_B_pair$c1, A_B_pair$c2)
  output.data <- data.frame(t(output))
  output_final <- cbind(A_B_pair, output.data)
}

test_function <- function(test_A, test_B, x,y){
  c1 <- test_A[[x]]
  c2 <- test_B[[y]]
  Data <- data.frame(1, XX=c1, YY=c2)
  
  model_lm <- lm(YY ~ XX, Data)
  est_lm <- as.numeric(model_lm$coefficients)[2]
  pvalue_lm <- as.numeric(summary(model_lm)$coeffi[,4][2])
  
  return(unlist(data.frame(lm.estimator = est_lm, lm.pvalue =pvalue_lm)))
}

result <- do.call(rbind, replicate(100, main_fn(), simplify = FALSE))
power <- mean(result$lm.pvalue <= 0.05)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you for your prompt reply@Ronak Shah. The dimension of the final results is the number of replications and the dimension of the A_B_pair (For example, the result dataset produces 10000 x 4. But, I want to check the p-value for each pair of A_B_pair. Therefore, I use the following code to get the desired output: ``` ```cols <- c("c1", "c2"); final.results <- result %>% group_by_at(cols) %>% summarise(across(everything(), list(mean)), .groups = 'drop'); power <- final.results$lm.pvalue_1 <= 0.5 ``` – user8129108 Nov 07 '21 at 05:04
  • Hi @F. Privé, I want to use bigstatsr package to get the same results as my datasets are big. Could you please help me to apply the big_apple function to get the same results? – user8129108 Nov 23 '21 at 04:40
  • @DataScientist It seems that this question has been solved. If you want to add another component to it (large data), you should open a new question focusing on that. – F. Privé Nov 23 '21 at 06:54
  • @F.Privé, thank you. I have opened a new question regarding this question. Please see the question (https://stackoverflow.com/questions/70080099/how-to-use-the-bigstatsr-r-package-using-two-datasets-for-estimating-the-paramet) – user8129108 Nov 23 '21 at 11:32