0

I made a minimal reproducible example, but my real data is really huge

a_p_ <-c(0.1, 0.3, 0.03, 0.03)
b_p_ <-c(0.2, 0.003, 0.1, 0.00001)
c_2<-c(1,2,5,23)
c_p_<-c(0.001, 0.002,0.002,0.00001)
results_1<-data.frame(a_p_,b_p_,c_2,c_p_)

a_p_ <-c(0.3, 0.02, 0.43, 0.44)
b_p_ <-c(0.00002, 0.3, 0.8, 0.005)
c_2 <-c(88,4,55,88)
c_p_<-c(0.1, 0.002,0.002,0.1)

results_2<-data.frame(a_p_,b_p_,c_2,c_p_)

so, I have two dataset. the one is "results_1" and the other is "results_2"

and then, I want to create new dataframe (data frame name is type1error) that contains the following examples.

More specific, I want this to be the first row of my new dataframe (type1error)

>   results_1 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1  0.5  0.5    0

and this to be my second row of my dataframe (type 1 error)

> results_2 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1 0.75  0.5  0.5

so what I did is..

# make empty holder

type1error<-as.data.frame(matrix(nrow = 2))

for(i in 1:2){
  # read the data 
  if(i==1){
    results<-results_1
  }
  if(i==2){
    results<-results_2
  }
  

  
  # mean() You can use mean() to get the proportion of TRUE of a logical vector.
  type1error[i,]<-results %>%
    summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  
  type1error$conditions[i] <- i 
  
}

but I got warning message like this, and the results does not seems to be what I was expected (summarise results for each row)

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.5, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables
2: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.75, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables

How can I fix this?

Mossa
  • 1,656
  • 12
  • 16
yoo
  • 491
  • 3
  • 10
  • 1
    Did you mean `results_1<-data.frame(a_p_,b_p_,c_2`,c_p_)` instead of `results_1<-data.frame(a_p_,b_p_,c,c_p_)`? – Jon Spring May 11 '22 at 21:44

3 Answers3

3

You could do

library(tidyverse)

list(results_1, results_2) %>%
  map_dfr(. %>% summarise(across(contains("_p_"), ~ mean(.x > 0.05))))
#>   a_p_ b_p_ c_p_
#> 1 0.50  0.5  0.0
#> 2 0.75  0.5  0.5

Created on 2022-05-11 by the reprex package (v2.0.1)

Mossa
  • 1,656
  • 12
  • 16
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • 3
    Or just use `map_dfr` instead of piping to `bind_rows`. – LMc May 11 '22 at 21:48
  • I have 200 results files.. such as "results_1", "results_2"..."results_200". I cannot type 200 dataset like list(results_1, results_2). And final product should be dataframe, not list. is there any clever way that I can use this code? (I appreciate your answer!!) – yoo May 12 '22 at 06:05
  • 2
    @yoo the final result here is a data frame, not a list. Are all of your 200 results in separate data frames in your global environment? Why? How did they get there? This is one of the reasons why you should keep a large collection of data frames in a list. There are ways to get them into a list without typing all the names though. Like `mget(paste0("result_", 1:200))` – Allan Cameron May 12 '22 at 07:13
2
library(dplyr) 
bind_rows(results_1 = results_1,  # Skip  "X =" if you don't
          results_2 = results_2,  #   need descriptive name
          .id = "id") %>%
  group_by(id) %>%
  summarize(across(contains("_p_"), ~mean(.x>0.05)))


# A tibble: 2 × 4
  id         a_p_  b_p_  c_p_
  <chr>     <dbl> <dbl> <dbl>
1 results_1  0.5    0.5   0  
2 results_2  0.75   0.5   0.5
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
2

If you are already working in the tidyverse, I think the answers posted are more consistent, but here is an option using more base R functions:

dfs <- list(results_1, results_2)

do.call(rbind, lapply(dfs, \(x) summarize(x, across(contains("_p_"), ~ mean(. > 0.05)))))

  a_p_ b_p_ c_p_
1 0.50  0.5  0.0
2 0.75  0.5  0.5
LMc
  • 12,577
  • 3
  • 31
  • 43