1

This question is from (how to put the results of summarise() function into the dataframe in r)

in the previous question, I think I did not convey my question well. so, I added more details.

I made a minimal reproducible example, but my real data is really huge

a_p_ <-c(0.1, 0.3, 0.03, 0.03)
b_p_ <-c(0.2, 0.003, 0.1, 0.00001)
c_2<-c(1,2,5,23)
c_p_<-c(0.001, 0.002,0.002,0.00001)
results_1<-data.frame(a_p_,b_p_,c_2,c_p_)

a_p_ <-c(0.3, 0.02, 0.43, 0.44)
b_p_ <-c(0.00002, 0.3, 0.8, 0.005)
c_2 <-c(88,4,55,88)
c_p_<-c(0.1, 0.002,0.002,0.1)

results_2<-data.frame(a_p_,b_p_,c_2,c_p_)

so, I have two dataset. the one is "results_1" and the other is "results_2" But, this is just an reproducible dataset. In my real dataset, I have 200 results files. (from "results_1" to "results_200")

and then, I want to create new dataframe (data frame name is type1error) that contains the following examples.

More specific, I want this to be the first row of my new dataframe (type1error)

>   results_1 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1  0.5  0.5    0

and this to be my second row of my dataframe (type 1 error)

> results_2 %>%
+     summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  a_p_ b_p_ c_p_
1 0.75  0.5  0.5

so what I did is..

# make empty holder

type1error<-as.data.frame(matrix(nrow = 2))

for(i in 1:2){
  # read the data 
  if(i==1){
    results<-results_1
  }
  if(i==2){
    results<-results_2
  }
  

  
  # mean() You can use mean() to get the proportion of TRUE of a logical vector.
  type1error[i,]<-results %>%
    summarise(across(contains("_p_"), ~ mean(.x > 0.05)))
  
  type1error$conditions[i] <- i 
  
}

but I got warning message like this, and the results does not seems to be what I was expected (summarise results for each row)

Warning messages:
1: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.5, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables
2: In `[<-.data.frame`(`*tmp*`, i, , value = list(a_p_ = 0.75, b_p_ = 0.5,  :
  provided 3 variables to replace 2 variables

How can I fix this?

The below code is not for this example dataset, but for my real dataset which generates the same error.

#FYI, Not reproducible, but the code that I did use for my real, huge,data is as follows:

ncond<-200

#empty holder 

type1error<-as.data.frame(matrix(nrow = ncond))

for(i in 1:ncond){
# read the data 
results <- read.csv(paste0("model_results/results_",i,".csv"))
 

# mean() You can use mean() to get the proportion of TRUE of a logical vector.
type1error[i,]<-results %>%
  summarise(across(contains("_p_"), ~ mean(.x > 0.05)))

type1error$conditions[i] <- i 

}
# one csv file in type 1 error rate 
# fixed
write.csv(type1error,"type1error/type1error.csv")

#and this code chunk did not work well. 

I appreciate all the answers in the previous question page!

In the answer from the previous question webpage, it is all for "results_1" and "results_2",becuase my reproducible example have only two dataset.

However, in reality, I have 200 dataset (from "results_1" to "results_200"..),

and I have to make a new dataframe, not a list.

yoo
  • 491
  • 3
  • 10

1 Answers1

2

You can use map and bind_rows in order to work with a list and output as a dataframe.

Map (purrr package) takes a list/vector does some function to it and then outputs a list, and then bind_rows (dplyr) can append the elements as a dataframe.

ResultList <-list(results_1, results_2)

sumit <- function(x) {
  summarise(x, across(contains("_p_"), ~ mean(.x > 0.05)))
}

FinalResult <- map(ResultList, ~sumit(.x))

Type1Error <- bind_rows(FinalResult)

You can also do it as a one-liner in map: map(ResultList, ~summarise(.x, across(contains("_p_"), ~ mean(.x > 0.05))))

In order to get all of your files into list format you could use map or lapply.

Edited to include modified version from the linked solution to get csv files into a list assuming you have a folder called "Data" in your R project directory that contains all the files.

setwd("./Data")
filenames <- list.files(full.names=TRUE)  
ResultList <- lapply(filenames,function(i){
read.csv(i)})

Solution for reading csv files into a list

alexrai93
  • 266
  • 2
  • 6
  • I have 200 results files.. such as "results_1", "results_2"..."results_200". I cannot type 200 dataset like ```list(results_1, results_2)```. is there any clever way that I can use this code? (I appreciate your answer!!) – yoo May 12 '22 at 06:03
  • What format are the files in? You can use lapply or map to read a batch of files into a list. – alexrai93 May 12 '22 at 07:04