I'm trying to analyze some data acquired from experimental tests with several variables being recorded. I've imported a dataframe into R and I want to obtain some statistical information by processing these data. In particular, I want to fill in an empty dataframe with the same variable names of the imported dataframe but with statistical features like mean, median, mode, max, min and quantiles as rows for each variable. The input dataframes are something like 60 columns x 250k rows each.
I've already managed to do this using apply
as in the following lines of code for a single input file.
df[1,] <- apply(mydata,2,mean,na.rm=T)
df[2,] <- apply(mydata,2,sd,na.rm=T)
...
Now I need to do this in a for loop for a number of input files mydata_1, mydata_2, mydata_3, ...
in order to build several summary statistics dataframes, one for each input file.
I tried in several different ways, trying with apply
and assign
but I can't really manage to access each row of interest in the output dataframes cycling over the several input files.
I wuold like to do something like the code below (I know that this code does not work, it's just to give an idea of what I want to do).
The output df dataframes are already defined and empty.
for (xx in 1:number_of_mydata_files) {
df_xx[1,]<-apply(mydata_xx,2,mean,na.rm=T)
df_xx[2,]<-apply(mydata_xx,2,sd,na.rm=T)
...
}
Actually I can't remember the error message given by this code, but the problem is that I can't even run this because it does not work.
I'm quite a beginner of R, so I don't have so much experience in using this language. Is there a way to do this? Are there other functions that could be used instead of apply and assign)?
EDIT:
I add here a simple table description that represents the input dataframes I’m using. Sorry for the poor data visualization right here. Basically the input dataframes I’m using are .csv imported files, looking like tables with the first row being the column description, aka the name of the measured variable, and the following rows being the acquired data. I have 250 000 acquisitions for each variable in each file, and I have something like 5-8 files like this being my input.
Current [A] | Force [N] | Elongation [%] | ...
—————————————————————————————————————
Value_a_1 | Value_b_1 | Value_c_1 | ...
I just want to obtain a data frame like this as an output, with the same variables name, but instead with statistical values as rows. For example, the first row, instead of being the first values acquired for each variable, would be the mean of the 250k acquisitions for each variable. The second row would be the standard deviation, the third the variance and so on. I’ve managed to build empty dataframes for the output summary statistics, with just the columns and no rows yet. I just want to fill them and do this iteratively in a for loop.