Using the data.table
package:
# load 'data.table'
library(data.table)
# melt into long format and add 'row.id' variable with number of each row
dat2 <- melt(setDT(dat)[, row.id := .I], id = 'row.id')
# create a grouping variable for each block of 25 values
dat2[, grp := rep(1:4, each = 25), by = row.id]
# summarise
dat2[, .(mn = mean(value), std = sd(value) ), by = .(row.id,grp)]
which gives:
row.id grp mn std
1: 1 1 -0.30388554 1.0307631
2: 2 1 0.04381967 0.7939788
3: 3 1 0.03106169 0.8581719
4: 4 1 -0.15215035 0.8200987
....
15: 15 1 -0.23641918 0.7024393
16: 16 1 0.09745967 1.0253811
17: 1 2 -0.16414997 0.8695713
18: 2 2 -0.06763887 1.0294245
....
31: 15 2 0.06034238 0.7756055
32: 16 2 0.16387033 0.9285894
33: 1 3 0.32860736 1.0802055
34: 2 3 0.51183174 0.9562819
....
47: 15 3 0.16075275 1.0335789
48: 16 3 -0.43298467 1.1010562
49: 1 4 0.24918962 0.9580600
50: 2 4 -0.13005426 1.1693455
....
62: 14 4 0.02436604 0.7341284
63: 15 4 -0.19614383 0.7039496
64: 16 4 0.01182338 0.8465747
How this works:
- With
setDT(dat)
the dataframe is converted to a data.table
(which is an enhanced form of a data.frame
)
[, row.id := .I]
add a variable with a rownumber
melt
is then used to transform the data into long format with the rownumber as identifier.
- Next, for each
row.id
a grouping variable is created with rep(1:4, each = 25)
which creates a vector of 25 1
's, then 25 2
's and so on. So for example, the first 25 values for row.id == 1
(which correspond to the first 25 columns of the original dat
-dataframe) get group id 1
, the 2nd 25 values get group id 2
, and so on.
- Next you summarise with
dat2[, .(mn = mean(value), std = sd(value) ), by = .(row.id,grp)]
where you use row.id
and grp
as grouping variable.
The result is a mean and a standard deviation for each group of columns for each row.
Another option is to use a combination of dcast
and melt
and the possibility to specify multiple aggregate functions in dcast
:
dcast(melt(setDT(dat)[, row.id := .I], id = 'row.id')[, grp := rep(1:4, each = 25), by = row.id],
row.id ~ grp, fun.aggregate = list(mean, sd))
which gives:
row.id value_mean_1 value_mean_2 value_mean_3 value_mean_4 value_sd_1 value_sd_2 value_sd_3 value_sd_4
1: 1 -0.30388554 -0.16414997 0.32860736 0.24918962 1.0307631 0.8695713 1.0802055 0.9580600
2: 2 0.04381967 -0.06763887 0.51183174 -0.13005426 0.7939788 1.0294245 0.9562819 1.1693455
3: 3 0.03106169 -0.07250312 0.21619928 0.13092043 0.8581719 1.1439506 0.9441762 1.0006230
4: 4 -0.15215035 -0.08417522 -0.27278714 -0.04190002 0.8200987 0.9008114 1.0394255 1.2063465
5: 5 0.21871123 0.08029101 -0.04965507 -0.15279897 0.9593703 0.8409534 0.8878550 1.0157824
6: 6 0.22335221 0.27142844 0.14032413 0.09975956 1.1154142 1.0896226 0.8587636 1.1147968
7: 7 0.16725794 -0.03462013 0.14675249 -0.15678569 0.9991910 0.9236954 1.1258560 1.0250408
8: 8 -0.12872236 0.03884649 -0.48565736 -0.30525278 1.0118579 1.0266040 1.1284902 0.9048042
9: 9 0.25986114 0.25181718 0.07673463 -0.11521187 1.0509685 0.8352278 1.0952720 1.0706587
10: 10 -0.32670802 -0.04590547 0.22610217 0.09406650 1.0674699 0.8378048 0.8128130 0.9126611
11: 11 -0.16219092 -0.24172025 -0.14231462 0.03671087 1.1617784 1.0522955 0.8899262 0.8982543
12: 12 0.21109682 0.19735885 -0.03901236 -0.19283362 0.9064956 0.9530479 1.0422911 0.8323033
13: 13 0.11926882 0.29611127 -0.37648849 -0.08673776 1.0739078 0.7220276 0.9455307 0.9623676
14: 14 0.26478861 0.16054927 -0.03315950 0.02436604 1.0555501 1.0713119 0.9112082 0.7341284
15: 15 -0.23641918 0.06034238 0.16075275 -0.19614383 0.7024393 0.7756055 1.0335789 0.7039496
16: 16 0.09745967 0.16387033 -0.43298467 0.01182338 1.0253811 0.9285894 1.1010562 0.8465747
With dplyr
/tidyr
:
library(dplyr)
library(tidyr)
dat %>%
mutate(id = row_number()) %>%
gather(k, v, 1:100) %>%
group_by(id) %>%
mutate(grp = rep(1:4, each = 25)) %>%
group_by(id, grp) %>%
summarise(mn = mean(v), std = sd(v))
Or with base R:
dat2 <- reshape(data = dat, ids = rownames(dat), direction = 'long', varying = list(names(dat)), times = names(dat))
dat2 <- transform(dat2, grp = ave(id, id, FUN = function(i) rep(1:4, each = 25)))
aggregate(X1 ~ id + grp, dat2, FUN = function(x) c(std = sd(x), mn = mean(x)))