0

I am trying to create a summary statistics data table by looping through the columns of an existing data table. So I want to compute the summary statistics per column. My data looks something like this:

I have tried looping through the columns, however when I do this, I am not able to extract the columns as normally done in a regular data frame. I am new to the data.table package so any help would really be appreciated.

DT <- data.table(math = c(7, 9, 3, 6), physics = c(7, 7, 4, 5), 
                 biology = c(6, 8, 7, 6))

> DT
   math physics biology
1:    7       7       6
2:    9       7       8
3:    3       4       7
4:    6       5       6

and I would like to get a new data table that looks something like this:

> DT2
   subject mean median min max
1:    math 6.25    6.5   3   9
2: physics 5.75    6.0   4   7
3: biology 6.75    6.0   6   8

2 Answers2

0

Here is a Tidyr solution, though you might be looking for a data.table one:

library(tidyr)

DT <- data.frame(math = c(7, 9, 3, 6), physics = c(7, 7, 4, 5), 
                 biology = c(6, 8, 7, 6))

DTSum <- DT %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(
    mean = mean(value),
    median = median(value),
    min = min(value),
    max = max(value)
  )
0

If you are looking for something very personalized you can try to build an function that will output the descriptive table as you want. However that given a lot of trouble.

In R are a lot of packages that offer many functions that you can use. The library psych produce an output very similar with the result's you are looking for.

Exemple:

library('psych')
DT <- data.frame(math = c(7, 9, 3, 6), physics = c(7, 7, 4, 5), 

                                  biology = c(6, 8, 7, 6))

describe.by(DT) 

Ouput's:

vars      n mean sd median trimmed  mad min   max range  skew
math       1 4 6.25 2.50    6.5    6.25 2.22   3   9     6 -0.21
physics    2 4 5.75 1.50    6.0    5.75 1.48   4   7     3 -0.14
biology    3 4 6.75 0.96    6.5    6.75 0.74   6   8     2  0.32

             kurtosis  se
    math       -1.92 1.25
    physics    -2.28 0.75
    biology    -2.08 0.48
Arduin
  • 233
  • 4
  • 15
  • Thank you for the suggestion. The table I've got however is already in data.table format and the desired summary table also needs to be a data.table. So I could convert the data.table to a data.frame and then do the desired operations and convert it back. Though it feels like there is a more direct solution. – ArturaScrumble Feb 16 '19 at 06:24