-1

I have to create a summary statistics table in R. The table has 4 columns. There is variable in data, remittance==0 and remittance==1 for which mean and SD of the characteristics of other variables (which will be in rows of the table) are to compared, like age, income, urban, poverty, etc. How do I create such a table? I cannot find any suitable method for this. Here is an example of the kind of table I want:

Variable     remittance==0      remittance==1
             mean     sd        mean      sd
age           
female
married
income
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • 5
    It would be nice if you could provide [*sample data*](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) representative of your actual data. – r2evans Sep 12 '17 at 17:03

3 Answers3

2

To get exactly the format you want, you can use tidyr/dplyr. With a lot of reshaping, tidying and rearranging...

summary_table = mtcars %>%
  group_by(vs) %>%
  summarize_all(funs(mean = mean, sd = sd)) %>%
  gather("stat", "val", -vs) %>%
  mutate(vs = paste0("vs", vs)) %>%
  unite(stat, stat, vs, sep = ".") %>%
  separate(stat, into = c("var", "stat"), sep = "_") %>%
  spread(stat, val) %>%
  select(var, mean.vs0, sd.vs0, mean.vs1, sd.vs1) %>%
  mutate_if(is.numeric, funs(round(., 3)))

Result with tidyr/dplyr:

# A tibble: 10 x 5
     var mean.vs0  sd.vs0 mean.vs1 sd.vs1
   <chr>    <dbl>   <dbl>    <dbl>  <dbl>
 1    am    0.333   0.485    0.500  0.519
 2  carb    3.611   1.539    1.786  1.051
 3   cyl    7.444   1.149    4.571  0.938
 4  disp  307.150 106.765  132.457 56.893
 5  drat    3.392   0.474    3.859  0.506
 6  gear    3.556   0.856    3.857  0.535
 7    hp  189.722  60.282   91.357 24.424
 8   mpg   16.617   3.861   24.557  5.379
 9  qsec   16.694   1.092   19.334  1.354
10    wt    3.689   0.904    2.611  0.715

You can also use stargazer, but I don't think you can combine them:

library(stargazer)
library(dplyr)

mtcars %>%
  split(mtcars$vs) %>%
  stargazer(type = "text", 
          summary.stat = c("mean", "sd"), 
          title = c("vs = 0", "vs = 1"))

Result with stargazer:

vs = 0
==========================
Statistic  Mean   St. Dev.
--------------------------
mpg       16.617   3.861  
cyl        7.444   1.149  
disp      307.150 106.765 
hp        189.722  60.282 
drat       3.392   0.474  
wt         3.689   0.904  
qsec      16.694   1.092  
vs         0.000   0.000  
am         0.333   0.485  
gear       3.556   0.856  
carb       3.611   1.539  
--------------------------

vs = 1
==========================
Statistic  Mean   St. Dev.
--------------------------
mpg       24.557   5.379  
cyl        4.571   0.938  
disp      132.457  56.893 
hp        91.357   24.424 
drat       3.859   0.506  
wt         2.611   0.715  
qsec      19.334   1.354  
vs         1.000   0.000  
am         0.500   0.519  
gear       3.857   0.535  
carb       1.786   1.051  
--------------------------

Notes:

  1. The advantage of the tidyr/dplyr method is that the output is a dataframe, so you can manipulate it and use it for further calculations. You can't do that with stargazer.
  2. The advantage of the stargazer method is that it can output the table in a nice looking table format. Even in Latex. Just change type = "text" to type = "latex". This is especially useful if you want to include descriptive statistics in publication or in the pdf output of your rmarkdown document.

Of course you can also combine the two methods and utilize both benefits:

Result with tidyr/dplyr + stargazer:

> stargazer(summary_table, type = "text", summary = FALSE)

========================================
   var  mean.vs0 sd.vs0  mean.vs1 sd.vs1
----------------------------------------
1   am   0.333    0.485    0.5    0.519 
2  carb  3.611    1.539   1.786   1.051 
3  cyl   7.444    1.149   4.571   0.938 
4  disp  307.15  106.765 132.457  56.893
5  drat  3.392    0.474   3.859   0.506 
6  gear  3.556    0.856   3.857   0.535 
7   hp  189.722  60.282   91.357  24.424
8  mpg   16.617   3.861   24.557  5.379 
9  qsec  16.694   1.092   19.334  1.354 
10  wt   3.689    0.904   2.611   0.715 
----------------------------------------

> stargazer(summary_table, type = "latex", summary = FALSE, header = FALSE)

enter image description here

acylam
  • 18,231
  • 5
  • 36
  • 45
2

This is commonly called 'table 1' in biomedical research. There is a handy R package, called tableone, that makes them for you very conveniently. If you post a reproducible example, I can show you how it works with your data. In lieu of that, the basic code would be something like:

library(tableone)
CreateTableOne(data=, vars=c("age","income"), factorVars=c("female","married"), 
               strata="remittance")
gung - Reinstate Monica
  • 11,583
  • 7
  • 60
  • 79
  • Interesting, I've never used this package before. Can it output the summary table as dataframe, or latex table? – acylam Sep 12 '17 at 20:19
  • @useR, I think there may be some latex methods, but I've never used them. I use it regularly. I mostly output it to Excel, where I edit it slightly to import into word documents. This is very, very common in biomedical research. – gung - Reinstate Monica Sep 12 '17 at 20:33
0
data(iris)
library(psych)

describeBy(iris[,-5], iris[,5])

Just replace with your data.

Frank
  • 66,179
  • 8
  • 96
  • 180
Balter
  • 1,085
  • 6
  • 12