I have a fitting process that runs 100 times. Each time the output is a dataframe -- which I capture by using lapply
to create a list of 100 dataframes.
The first two dataframes might look something like this (I have more than 1 column):
n1 = c(4, 5, 6)
df1 = data.frame(n1, row.names = c("height", "weight", "favcolor"))
n2 = c(2, 3, 5, 7)
df2 = data.frame(n2, row.names = c("height", "weight", "inseam", "favcolor"))
I would like to combine these dataframes (take the average height value, or the standard deviation of the weights, for example).
My first thought was to turn this list into a dataframe -- not going to work because arguments imply differing number of rows
.
My second thought was to insert NAs for each rowname that didn't appear (so, I would be adding:
new_row <- c(NA, NA)
row.names(new_row) <- "inseam"
But I can't get that to work for a myrid of reasons, and I'm unable to even test if I add an "inseam" row that contains NAs to the end of df1, that when I take the average of the "inseam" I get the correct answer.
The correct average in this case being:
height 3
weight 4
inseam 5
favcolor 6.5
So, my question:
If you had a list of dataframes, where the row names are meaningful and need to be aggregated across like the above example, what is the best way to do that?
################EDIT
Here is the full data from 3 of my datasets:
> mega_df[1]
[[1]]
coeff error pval
rf2 -1.15099200 0.5752430 4.540538e-02
rf3 -0.53430218 0.4928974 2.783635e-01
rf4 0.08784138 0.4933079 8.586711e-01
rf5 0.96002601 0.5070957 5.833327e-02
rm2 -0.36188368 0.4626464 4.340949e-01
rm3 0.01805873 0.4355164 9.669251e-01
rm4 0.45008373 0.4319557 2.974268e-01
rm5 1.04056503 0.4441024 1.912556e-02
rc_cat1 0.86231928 0.2827566 2.290799e-03
rc_cat2 1.21335473 0.2448206 7.192321e-07
rc_cat3 0.96196637 0.2044198 2.528247e-06
rc_cat4 1.04477290 0.3302644 1.559142e-03
rc_cat5 -0.58902623 1.5893867 7.109357e-01
rc_cat6 0.62569607 0.1720676 2.765407e-04
rc_cat7 0.29381724 0.4115594 4.752815e-01
rs2 0.12333678 0.7186019 8.637250e-01
rs3 1.22018613 0.6423970 5.750837e-02
rs4 1.96075220 0.6454184 2.381892e-03
rs5 2.58404946 0.6543862 7.853927e-05
1|3 0.01561497 0.4851330 9.743229e-01
3|4 1.82853786 0.4937675 2.128663e-04
4|5 3.73480100 0.5023435 1.047616e-13
> mega_df[2]
[[1]]
coeff error pval
rf2 -0.23364248 5.849338e-01 6.895734e-01
rf3 0.24054894 5.219730e-01 6.449094e-01
rf4 0.84072979 5.208259e-01 1.064788e-01
rf5 1.47867154 5.346970e-01 5.684640e-03
rm2 -0.29555400 4.465509e-01 5.080612e-01
rm3 0.31147504 4.131024e-01 4.508553e-01
rm4 0.73696523 4.141224e-01 7.514424e-02
rm5 1.14273148 4.271863e-01 7.472508e-03
rc_cat1 1.27479299 3.094432e-01 3.794740e-05
rc_cat2 1.10917318 2.619011e-01 2.284654e-05
rc_cat3 0.65782540 2.161602e-01 2.340525e-03
rc_cat4 0.40512225 3.301662e-01 2.198131e-01
rc_cat5 12.78797722 5.612311e-08 0.000000e+00
rc_cat6 0.41622889 1.677804e-01 1.310894e-02
rc_cat7 0.16833629 3.806498e-01 6.583198e-01
rs2 -0.02279305 7.225878e-01 9.748360e-01
rs3 0.68299485 6.759050e-01 3.122608e-01
rs4 1.36149302 6.780722e-01 4.465519e-02
rs5 2.18484594 6.863696e-01 1.456612e-03
1|3 0.35419237 5.844931e-01 5.445266e-01
3|4 2.12603072 5.928308e-01 3.354863e-04
4|5 3.97564508 5.999369e-01 3.431064e-11
> mega_df[3]
[[1]]
coeff error pval
rf2 -0.2733408 0.5884741 6.422961e-01
rf3 0.1764257 0.5257697 7.372050e-01
rf4 0.6504428 0.5248386 2.152271e-01
rf5 1.3967757 0.5356706 9.119879e-03
rm2 -0.2361284 0.4870015 6.277745e-01
rm3 0.2078729 0.4609270 6.519977e-01
rm4 0.6390950 0.4622065 1.667555e-01
rm5 1.1738653 0.4730686 1.308730e-02
rc_cat1 0.9337627 0.2958630 1.599133e-03
rc_cat2 1.0292916 0.2493133 3.651281e-05
rc_cat3 0.7088285 0.2012026 4.267587e-04
rc_cat4 0.6296966 0.3664883 8.576193e-02
rc_cat6 0.5475018 0.1720841 1.464662e-03
rc_cat7 0.4521113 0.3588440 2.077017e-01
rs2 -0.4663666 0.7031265 5.071541e-01
rs3 0.7810059 0.6489673 2.287985e-01
rs4 1.5178641 0.6522175 1.995271e-02
rs5 2.1916080 0.6578769 8.643075e-04
1|3 0.2569225 0.4659460 5.813597e-01
3|4 2.0648302 0.4769118 1.493906e-05
4|5 3.9312070 0.4855339 5.648509e-16
And I'm hoping to do some basic aggregations that end up returning:
avcoef averror avpval std(coef)
rf2 W X Y Z
rf3 ...
.
.
.