How to avoid same column names when multiple transformations in data.table?

Question

I tried to make multiple transformations for the same columns in a data.table and found this answer. However, if I follow the steps there I get identical column names (instead of mean.Obs_1, etc.).

library(data.table)
set.seed(1)
dt = data.table(ID=c(1:3), Obs_1=rnorm(9), Obs_2=rnorm(9), Obs_3=rnorm(9))

dt[, c(mean = lapply(.SD, mean), sd = lapply(.SD, sd)), by = ID]
#   ID      Obs_1      Obs_2      Obs_3     Obs_1     Obs_2     Obs_3
#1:  1  0.4854187 -0.3238542  0.7410611 1.1108687 0.2885969 0.1067961
#2:  2  0.4171586 -0.2397030  0.2041125 0.2875411 1.8732682 0.3438338
#3:  3 -0.3601052  0.8195368 -0.4087233 0.8105370 0.3829833 1.4705692

Is there a way to avoid this behavior and get different column names for different transformations? I use the latest (1.9.4) stable version of data.table.

This'll be better once [#1063](https://github.com/Rdatatable/data.table/issues/1063) is implemented. — Arun, Jul 07 '15 at 12:18
[Calculate multiple aggregations with lapply(.SD, …)](https://stackoverflow.com/questions/24151602/calculate-multiple-aggregations-with-lapply-sd) — Henrik, Jul 02 '19 at 10:35

akrun · Accepted Answer · 2015-07-07T09:43:41.693

6

You could try

library(data.table)
dt[, unlist(lapply(.SD, function(x) list(Mean=mean(x),
                    SD=sd(x))),recursive=FALSE), by=ID]
#   ID Obs_1.Mean  Obs_1.SD Obs_2.Mean  Obs_2.SD Obs_3.Mean  Obs_3.SD
#1:  1  0.4854187 1.1108687 -0.3238542 0.2885969  0.7410611 0.1067961
#2:  2  0.4171586 0.2875411 -0.2397030 1.8732682  0.2041125 0.3438338
#3:  3 -0.3601052 0.8105370  0.8195368 0.3829833 -0.4087233 1.4705692

Or a variation as suggested by @David Arenburg

 dt[, as.list(unlist(lapply(.SD, function(x) list(Mean=mean(x),
              SD=sd(x))))), by=ID]
 #   ID Obs_1.Mean  Obs_1.SD Obs_2.Mean  Obs_2.SD Obs_3.Mean  Obs_3.SD
 #1:  1  0.4854187 1.1108687 -0.3238542 0.2885969  0.7410611 0.1067961
 #2:  2  0.4171586 0.2875411 -0.2397030 1.8732682  0.2041125 0.3438338
 #3:  3 -0.3601052 0.8105370  0.8195368 0.3829833 -0.4087233 1.4705692

edited Jul 07 '15 at 09:43

answered Jul 07 '15 at 09:33

akrun

874,273
37
540
662

1

That's a nice one. Never seen this before. – David Arenburg Jul 07 '15 at 09:34
@DavidArenburg Thanks, just tried some combinations and got it right – akrun Jul 07 '15 at 09:38
2

Also `dt[, as.list(unlist(lapply(.SD, function(x) list(Mean=mean(x), SD=sd(x))))), by=ID]` which suddenly looks like something I wrote once sometime – David Arenburg Jul 07 '15 at 09:41

janosdivenyi · Answer 2 · 2015-07-07T09:47:59.583

2

If the data is not that big and the focus is on readability, using dplyr might be also a good idea.

library(dplyr)
dt %>% group_by(ID) %>% summarise_each(funs(mean, sd))
#  ID Obs_1_mean Obs_2_mean Obs_3_mean  Obs_1_sd  Obs_2_sd  Obs_3_sd
#1  1  0.4854187 -0.3238542  0.7410611 1.1108687 0.2885969 0.1067961
#2  2  0.4171586 -0.2397030  0.2041125 0.2875411 1.8732682 0.3438338
#3  3 -0.3601052  0.8195368 -0.4087233 0.8105370 0.3829833 1.4705692

(As pointed out by @akrun, this won't work if you are using just one function in funs().)

edited Jul 07 '15 at 09:47

answered Jul 07 '15 at 09:38

janosdivenyi

3,136
2
24
36

1

Note that when are using a single function. you may not get the `mean` in the column names. i.e. `dt %>% group_by(ID) %>% summarise_each(funs(mean))` and `dt[, unlist(lapply(.SD, function(x) list(Mean=mean(x))), recursive=FALSE), by= ID]` – akrun Jul 07 '15 at 09:40
@akrun That is right. But if you are using just a single function, the same name of the columns is a smaller problem. – janosdivenyi Jul 07 '15 at 09:42
Yes, I understand, just to show the difference in behavior – akrun Jul 07 '15 at 09:44

How to avoid same column names when multiple transformations in data.table?

2 Answers2

Related